{"id":1327,"date":"2013-07-09T14:35:39","date_gmt":"2013-07-09T12:35:39","guid":{"rendered":"http:\/\/blog.gocept.com\/?p=1327"},"modified":"2013-07-11T20:52:47","modified_gmt":"2013-07-11T18:52:47","slug":"monitoringlove-sprint-takeaway","status":"publish","type":"post","link":"https:\/\/blog.gocept.com\/2013\/07\/09\/monitoringlove-sprint-takeaway\/","title":{"rendered":"#monitoringlove sprint takeaway"},"content":{"rendered":"

A few weeks ago I co-organised and participated in a #monitoringlove sprint in Berlin<\/a>.<\/p>\n

My personal plan was to play with more modern utilities that can potentially replace our existing Nagios monitoring chain. The result of what I think would be a good setup would probably look like this:<\/p>\n

\"monitoringlove2\"<\/a><\/p>\n

Most of those parts already exist. The new thing in there is what I called “riemann-actual” – something that generates new events based on existing events from the index. I call this “higher order” monitoring – in Nagios these would be known as “business processes”.<\/p>\n

The word “business processes” is a bit misleading as nothing is really about processes there: it means taking previously taken monitoring data and subsuming it into a more dense expression. Ideally you can recombine any of your metrics to make an overall statement of “everything is good if more than 80% of the appservers are up and we have less than 5% of error response rate and the frontpage is reachable from at least 3 outside systems”.<\/p>\n

Data gathering<\/strong><\/p>\n

First, I tried to setup something for data gathering. I already got the recommendation to look at scales<\/a>\u00a0for in-app metrics and found it easy to get started. I like the notion that metrics in your app behave a little bit like logging: you don’t care where they go and you expect the user of your system to configure an actual target. The built-in webserver is nice to get started and graphite as a protocol seams fair enough nowadays to forward data.<\/p>\n

To gather system-level metrics I guess both collectd<\/a> and statsd<\/a> are fine points to start from. I used collectd to begin with as it actually had a riemann output plugin.<\/p>\n

Central processing<\/strong><\/p>\n

We want to be able to take all of the data we acquire into account on making decisions quickly. Riemann<\/a> seems to be the most suitable tool for this task. After playing around for a while trying to implement “business process” monitoring in clojure I found it easier to provide a Python-environment that can talk to riemann and do those decisions. I made this available as “riemann-actual<\/a>” on bitbucket.<\/p>\n

I noticed that this setup would require only a very generic riemann configuration and could perform on a per-customer or per-project basis by just adding more of those loops on top of riemann.<\/p>\n

Performance-wise I was extremely happy. I could have a 10Hz monitoring loop resulting in about 1k events per second on my computer. With that resolution all business processes would notice an outage with no visible delay.<\/p>\n

A nice feat is that Riemann can generate events when old events reach their TTL. This way you can make sure that you notice when a system you are monitoring “goes dark”.<\/p>\n

Also, it seems that Riemann configuration can be unit-tested easily: feed events in, watch the index, or see events coming out. It doesn’t get much simpler than that.<\/p>\n

The configurable dashboard in Riemann 0.2 is already very helpful: responsive, flexible, and fast \u00a0– until you try to display 10k metrics at once. \ud83d\ude09 It needs a little more finishing but it’s on a good way.<\/p>\n

Distributed consumers<\/strong><\/p>\n

My understand of Riemann is that it wants to be a nexus for “central, volatile, shared state”. This means you get a lot of updates going through and that it needs to be good with I\/O. OTOH it means that it shouldn’t do much and just make it easy for you to router your data somewhere else.<\/p>\n

Actually looking at further consumers didn’t happen as 3 days aren’t that long. \ud83d\ude42 I see graphite on the horizon (with the setup becoming easier over time) as well as more custom tooling to turn events into notifications, etc.<\/p>\n

A look at OpenTSDB seemed promising at first but it turns out to have an even more complex setup requirement than graphite. I got it running but it seemed extremly hard to control, so I dropped it after a few hours.<\/p>\n

Overall it seems that since the outcry of #monitoringsucks a lot has happened and I’m faithful that there’s a way out of Nagiosland in the near future.<\/p>\n

More notes from our sprint are available at pysprints.de.<\/a>\u00a0(Although in German.)<\/p>\n

\u00a0<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"

A few weeks ago I co-organised and participated in a #monitoringlove sprint in Berlin. My personal plan was to play with more modern utilities that can potentially replace our existing Nagios monitoring chain. The result of what I think would be a good setup would probably look like this: Most of those parts already exist. … Continue reading “#monitoringlove sprint takeaway”<\/span><\/a><\/p>\n","protected":false},"author":12391367,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_coblocks_attr":"","_coblocks_dimensions":"","_coblocks_responsive_height":"","_coblocks_accordion_ie_support":"","advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_newsletter_tier_id":0,"footnotes":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[10221],"tags":[111949,22272],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_likes_enabled":true,"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pFP3y-lp","jetpack-related-posts":[{"id":1433,"url":"https:\/\/blog.gocept.com\/2014\/08\/04\/september-18th-20th-devops-sprint\/","url_meta":{"origin":1327,"position":0},"title":"September, 18th\u201320th: DevOps Sprint","author":"Daniel Havlik","date":"August 4, 2014","format":false,"excerpt":"Since we have a strong history in web development, but also were involved in operating web applications we developed, the DevOps movement hit our nerves.Under the brand name \"Flying Circus\" we are establishing a platform respecting the DevOps principles.A large portion of our day-to-day work is dedicated to DevOps related\u2026","rel":"","context":"In "en"","block_context":{"text":"en","link":"https:\/\/blog.gocept.com\/category\/en\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":79,"url":"https:\/\/blog.gocept.com\/2011\/05\/31\/rrdtool-restore-and-merge-from-backup\/","url_meta":{"origin":1327,"position":1},"title":"rrdtool restore and merge from backup","author":"Daniel Havlik","date":"May 31, 2011","format":false,"excerpt":"We recently had an issue with our backup server which was also running Nagios including pnp4nagios to gather performance data. We quickly started to deploy a new Nagios server which started gathering statistics again right away. After pulling the historical RRD databases from the backup we discovered no easy way\u2026","rel":"","context":"In "en"","block_context":{"text":"en","link":"https:\/\/blog.gocept.com\/category\/en\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3212,"url":"https:\/\/blog.gocept.com\/2018\/04\/03\/a-heartily-welcome-for-zope-in-the-python-3-wonderland\/","url_meta":{"origin":1327,"position":2},"title":"A heartily welcome for Zope in the Python 3 wonderland","author":"Michael Howitz","date":"April 3, 2018","format":false,"excerpt":"Once upon the time there was Earl Zope II. A wise guy was telling him that his world will come to an end. He found out that this was true that he had only some years to prepare to immigrate to the Python 3 wonderland. His preparation was successful: He\u2026","rel":"","context":"In "en"","block_context":{"text":"en","link":"https:\/\/blog.gocept.com\/category\/en\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2018\/03\/cathal-mac-an-bheatha-223618-unsplash-e1522239717994.jpg?fit=1017%2C1200&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2018\/03\/cathal-mac-an-bheatha-223618-unsplash-e1522239717994.jpg?fit=1017%2C1200&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2018\/03\/cathal-mac-an-bheatha-223618-unsplash-e1522239717994.jpg?fit=1017%2C1200&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2018\/03\/cathal-mac-an-bheatha-223618-unsplash-e1522239717994.jpg?fit=1017%2C1200&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":3455,"url":"https:\/\/blog.gocept.com\/2023\/05\/12\/ready-set-sprint-earl-zope-invites-again\/","url_meta":{"origin":1327,"position":3},"title":"Ready, set, sprint: Earl Zope invites again","author":"Michael Howitz","date":"May 12, 2023","format":false,"excerpt":"Sprint with Earl Zope at September, 21st and 22nd of 2023.","rel":"","context":"In "en"","block_context":{"text":"en","link":"https:\/\/blog.gocept.com\/category\/en\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2023\/05\/pexels-photo-12155522.jpeg?fit=1200%2C675&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2023\/05\/pexels-photo-12155522.jpeg?fit=1200%2C675&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2023\/05\/pexels-photo-12155522.jpeg?fit=1200%2C675&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2023\/05\/pexels-photo-12155522.jpeg?fit=1200%2C675&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2023\/05\/pexels-photo-12155522.jpeg?fit=1200%2C675&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":3240,"url":"https:\/\/blog.gocept.com\/2018\/07\/09\/saltlabs-sprint-zope-and-plone-sprint-in-a-new-location\/","url_meta":{"origin":1327,"position":4},"title":"Saltlabs Sprint: Zope and Plone sprint in a new location","author":"Michael Howitz","date":"July 9, 2018","format":false,"excerpt":"After Earl Zope II is now nearly relocated to the Python 3 wonderland, gocept will move to a new head quarter in the next months. This is the right time to celebrate with a new sprint, as we have now even more space for sprinters. The new location is called\u2026","rel":"","context":"In "en"","block_context":{"text":"en","link":"https:\/\/blog.gocept.com\/category\/en\/"},"img":{"alt_text":"Photo by Jill Heyer on Unsplash","src":"https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2018\/07\/jill-heyer-247995-unsplash.jpg?fit=1200%2C800&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2018\/07\/jill-heyer-247995-unsplash.jpg?fit=1200%2C800&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2018\/07\/jill-heyer-247995-unsplash.jpg?fit=1200%2C800&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2018\/07\/jill-heyer-247995-unsplash.jpg?fit=1200%2C800&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2018\/07\/jill-heyer-247995-unsplash.jpg?fit=1200%2C800&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":3442,"url":"https:\/\/blog.gocept.com\/2022\/03\/09\/sprint-with-earl-zope-in-april-of-2022\/","url_meta":{"origin":1327,"position":5},"title":"Sprint with Earl Zope in April of 2022","author":"Michael Howitz","date":"March 9, 2022","format":false,"excerpt":"Sprint with Earl Zope at April, 13th 2022 towards Python 3.11 compatibility.","rel":"","context":"In "en"","block_context":{"text":"en","link":"https:\/\/blog.gocept.com\/category\/en\/"},"img":{"alt_text":"4 x 400 m","src":"https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2022\/02\/athletics-3752266_1920.jpg?fit=1200%2C801&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2022\/02\/athletics-3752266_1920.jpg?fit=1200%2C801&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2022\/02\/athletics-3752266_1920.jpg?fit=1200%2C801&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2022\/02\/athletics-3752266_1920.jpg?fit=1200%2C801&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2022\/02\/athletics-3752266_1920.jpg?fit=1200%2C801&ssl=1&resize=1050%2C600 3x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/posts\/1327"}],"collection":[{"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/users\/12391367"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/comments?post=1327"}],"version-history":[{"count":3,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/posts\/1327\/revisions"}],"predecessor-version":[{"id":1331,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/posts\/1327\/revisions\/1331"}],"wp:attachment":[{"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/media?parent=1327"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/categories?post=1327"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/tags?post=1327"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}