{"id":85,"date":"2011-06-27T09:57:59","date_gmt":"2011-06-27T07:57:59","guid":{"rendered":"http:\/\/blog.gocept.com\/?p=85"},"modified":"2012-12-03T15:10:03","modified_gmt":"2012-12-03T14:10:03","slug":"no-luck-with-glusterfs","status":"publish","type":"post","link":"https:\/\/blog.gocept.com\/2011\/06\/27\/no-luck-with-glusterfs\/","title":{"rendered":"No luck with glusterfs"},"content":{"rendered":"<p>Recently, we&#8217;ve been experimenting with <a href=\"http:\/\/www.gluster.org\/\">glusterfs<\/a> as an alternative network storage backing our VM hosting. It looked like a very promising candidate to replace our current iSCSI stack: scale-out with decent performance, mostly self-configuring, self-replicating, self-healing. And all of this out-of-the-box without complex setup. In contrast, the conventional architecture with a complex layering of <a href=\"http:\/\/stgt.sourceforge.net\/\">iSCSI targets<\/a>, <a href=\"http:\/\/www.drbd.org\/\">DRBD<\/a>, and <a href=\"http:\/\/www.linux-ha.org\/\">Linux-HA<\/a> glued together with a pack of shell scripts looks rather 90&#8217;s.<\/p>\n<p>We played with glusterfs for a while. Setting up and configuring the software went quite smooth compared to the traditional stuff. But after some stress testing in a replicated scenario, we found severe problems.<\/p>\n<h3><strong>Synchronisation<\/strong><\/h3>\n<p>On the storage, the virtual machines represent themselves basically as one big image file. This image can become several hundreds of Gigabytes big. This is OK as long as the replicated file servers are in sync. But once one goes offline and online again, the versions of the image may differ and the self-healing algorithm is triggered. Due to glusterfs&#8217; architecture, this happens\u00a0 entirely on the filesystem client (i.e., the KVM host). After re-connecting a file server, all VM I\/O is to be paused until self-healing is complete. The live VM is stuck for some amount of time between several seconds and more than a minute. A considerable portion of our hosting cluster could freeze for minutes. This is cleary unacceptable. Re-connecting a previously disconnected file server would be a risky operation: quite the opposite of what replication is good for.<\/p>\n<h3><strong>No global state<\/strong><\/h3>\n<p>Another feature of glusterfs is that replication is handled entirely on the filesystem client and not on the server. This leads to an orthogonal and modular approach which has a lot of advantages. But it makes it hard to determine when a file server can be disconnected safely: Given that self-healing takes a considerable amount of time, we cannot be sure if there is still some self-heal operation in progress. But disconnecting a replicated file server which had the newer copy of a VM image before the other file server has caught up would render the VM unusable. Unfortunately, there seems to be no easy way to query a glusterfs file server for active self-healing operations. This makes disconnecting a file server a risky operation, too.<\/p>\n<h3><strong>Good for its intended use<\/strong><\/h3>\n<p>In summary, we learned that glusterfs&#8217; architecture is a good fit for the use case it has originally been designed: a NFS replacement with lots of small files. But for our scenario where continuously running processes need to access a few large image files uninterruptedly, glusterfs seems not to be the best fit.<\/p>\n<p>So we will stick to the good ol&#8217; iSCSI stack for now. Perhaps <a href=\"http:\/\/ceph.newdream.net\/\">Ceph<\/a> or <a href=\"http:\/\/www.osrg.net\/sheepdog\/\">Sheepdog<\/a> will become viable alternatives in the future once they stabilise.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently, we&#8217;ve been experimenting with glusterfs as an alternative network storage backing our VM hosting. It looked like a very promising candidate to replace our current iSCSI stack: scale-out with decent performance, mostly self-configuring, self-replicating, self-healing. And all of this out-of-the-box without complex setup. In contrast, the conventional architecture with a complex layering of iSCSI &hellip; <a href=\"https:\/\/blog.gocept.com\/2011\/06\/27\/no-luck-with-glusterfs\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;No luck with glusterfs&#8221;<\/span><\/a><\/p>\n","protected":false},"author":11966441,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_coblocks_attr":"","_coblocks_dimensions":"","_coblocks_responsive_height":"","_coblocks_accordion_ie_support":"","advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_newsletter_tier_id":0,"footnotes":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[10221],"tags":[21388408,42965,3386,13505],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_likes_enabled":true,"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pFP3y-1n","jetpack-related-posts":[{"id":1246,"url":"https:\/\/blog.gocept.com\/2013\/03\/03\/how-we-organize-large-scale-roll-outs\/","url_meta":{"origin":85,"position":0},"title":"How we organize large-scale roll-outs","author":"Daniel Havlik","date":"March 3, 2013","format":false,"excerpt":"In the coming week we will deploy an extensive OS update to our production environment which (right now) currently consists of 41 physical hosts running 195 virtual machines. Updates like this are prepared very carefully in many small steps using our\u00a0development and staging setups that reflect the exactly same environment\u2026","rel":"","context":"In &quot;en&quot;","block_context":{"text":"en","link":"https:\/\/blog.gocept.com\/category\/en\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1327,"url":"https:\/\/blog.gocept.com\/2013\/07\/09\/monitoringlove-sprint-takeaway\/","url_meta":{"origin":85,"position":1},"title":"#monitoringlove sprint takeaway","author":"Daniel Havlik","date":"July 9, 2013","format":false,"excerpt":"A few weeks ago I co-organised and participated in a #monitoringlove sprint in Berlin. My personal plan was to play with more modern utilities that can potentially replace our existing Nagios monitoring chain. The result of what I think would be a good setup would probably look like this: Most\u2026","rel":"","context":"In &quot;en&quot;","block_context":{"text":"en","link":"https:\/\/blog.gocept.com\/category\/en\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2013\/07\/monitoringlove2.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":3229,"url":"https:\/\/blog.gocept.com\/2018\/06\/07\/migrate-a-zope-zodb-data-fs-to-python-3\/","url_meta":{"origin":85,"position":2},"title":"Migrate a Zope ZODB Data.fs to Python 3","author":"Michael Howitz","date":"June 7, 2018","format":false,"excerpt":"TL;DR Use\u00a0zodbupdate. Problem A ZODB\u00a0Data.fs\u00a0which was created under Python 2 cannot be opened under Python 3. This is prevented by using a different magic code in the first bytes of the file. This is done on purpose because str\u00a0has a different meaning for the two Python versions: Under Python 2\u2026","rel":"","context":"In &quot;en&quot;","block_context":{"text":"en","link":"https:\/\/blog.gocept.com\/category\/en\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2018\/06\/spring-3383890_1280.jpg?fit=1200%2C797&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2018\/06\/spring-3383890_1280.jpg?fit=1200%2C797&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2018\/06\/spring-3383890_1280.jpg?fit=1200%2C797&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2018\/06\/spring-3383890_1280.jpg?fit=1200%2C797&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/blog.gocept.com\/wp-content\/uploads\/2018\/06\/spring-3383890_1280.jpg?fit=1200%2C797&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":1332,"url":"https:\/\/blog.gocept.com\/2013\/07\/15\/reliable-file-updates-with-python\/","url_meta":{"origin":85,"position":3},"title":"Reliable file updates with Python","author":"","date":"July 15, 2013","format":false,"excerpt":"Programs need to update files. Although most programmers know that unexpected things can happen while performing I\/O, I often see code that has been written in a surprisingly na\u00efve way. In this article, I would like to share some insights on how to improve I\/O reliability in Python code. Consider\u2026","rel":"","context":"In &quot;en&quot;","block_context":{"text":"en","link":"https:\/\/blog.gocept.com\/category\/en\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":179,"url":"https:\/\/blog.gocept.com\/2012\/05\/28\/sprint-report-deploying-python-web-applications-platforms-and-applications\/","url_meta":{"origin":85,"position":4},"title":"Sprint report: Deploying Python web applications &#8211; platforms and applications","author":"Daniel Havlik","date":"May 28, 2012","format":false,"excerpt":"Last week I met Stephan Diehl, Michael Hierweck, Veit Schiele, and Jens Vagelpohl\u00a0in Berlin for a sprint. Our chosen topic was \"Python web application\u00a0deployment\". In this post I'd like to recap our discussions, gocept's perspective on those, and the deployment tool \"batou\" that we have been incubating in the last\u2026","rel":"","context":"In &quot;en&quot;","block_context":{"text":"en","link":"https:\/\/blog.gocept.com\/category\/en\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":74,"url":"https:\/\/blog.gocept.com\/2011\/05\/04\/how-to-undo-a-transaction-with-the-zodb\/","url_meta":{"origin":85,"position":5},"title":"How-To: Undo a transaction with the ZODB","author":"","date":"May 4, 2011","format":false,"excerpt":"Suppose you've written a script to \"fix something real quick\" and unleashed it upon your live database. Five minutes later, you discover your script had a bug, and now you've wrecked quite a bit of production data. Ouch. You might be lucky, though, since the ZODB offers transaction-level undo. This\u2026","rel":"","context":"In &quot;en&quot;","block_context":{"text":"en","link":"https:\/\/blog.gocept.com\/category\/en\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/posts\/85"}],"collection":[{"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/users\/11966441"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/comments?post=85"}],"version-history":[{"count":10,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/posts\/85\/revisions"}],"predecessor-version":[{"id":329,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/posts\/85\/revisions\/329"}],"wp:attachment":[{"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/media?parent=85"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/categories?post=85"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/tags?post=85"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}