Getting sys.path out of buildout

A wrinkly part of buildout‘s design is that the PYTHONPATH is not easily available outside of scripts generated by buildout itself.

I’ve been using the following workaround in some of my development tools for a while and found it quite helpful, even though it’s hacky and rough around the edges:

#!/bin/bash
if [ ! -e bin/wpy ]; then
    bin/buildout install wpy
fi
export PYTHONPATH=$(bin/wpy -c "import sys; print ':'.join([p for p in sys.path if not p.startswith('/usr')])")
# do real work here

This needs a part that installs a python interpreter into the buildout; I’m using this in my ~/.buildout/default.cfg, since all projects I deal with have a [test] part:

[wpy]
recipe = zc.recipe.egg
eggs = ${test:eggs}
interpreter = wpy

A variation on this theme is this script which generates a TAGS file without the need for a special recipe (like z3c.recipe.tag) just to get the PYTHONPATH:

#!/bin/bash
if [ ! -e bin/wpy ]; then
    bin/buildout install wpy
fi
export IFS=
PATHS=$(bin/wpy -c "import sys; print '\n'.join([p for p in sys.path if not p.startswith('/usr')])")
echo $PATHS | ctags --python-kinds=-i -R -e -L -

Assertion helper for zope.testbrowser and unittest

zope.testbrowser is a valuable tool for integration tests. Historically,  the Zope community used to write quite a lot of doctests, but we at gocept have found them to be rather clumsy and too often yielding neither good tests nor good documentation. That’s why we don’t use doctest much anymore, and prefer plain unittest.TestCases instead. However, doctest has one very nice feature, ellipsis matching, that is really helpful for checking HTML output, since you can only make assertions about the parts that interest you. For example, given this kind of page:

>>> print browser.contents
<html>
  <head>
    <title>Simple Page</title>
  </head>
  <body>
    <h1>Simple Page</h1>
  </body>
</html>

If all you’re interested in is that the <h1> is rendered properly, you can simply say:

>>> print browser.contents
<...<h1>Simple Page</h1>...

We’ve now ported this functionality to unittest, as assertEllipsis, in gocept.testing. Some examples:

self.assertEllipsis('...bar...', 'foo bar qux')
# -> nothing happens

self.assertEllipsis('foo', 'bar')
# -> AssertionError: Differences (ndiff with -expected +actual):
     - foo
     + bar

self.assertNotEllipsis('foo', 'foo')
# -> AssertionError: "Value unexpectedly matches expression 'foo'."

To use it, inherit from gocept.testing.assertion.Ellipsis in addition to unittest.TestCase.

gocept talks at PyCon DE

No blog post for quite a while… part of the reason is that we gocept developers were busy preparing talks for PyCon DE 2011. As result, we presented an impressive number of 7 talks/tutorials at this lovely conference.

Curious? Here is a list of all sessions (most with video recordings). Please be aware that nearly all of this stuff is in German.

Tutorials

Talks

Shutting down an HTTPServer

For integration tests it can be helpful to have a fake HTTP server whose behaviour the tests can control. All necessary building blocks are even included in Python standard library. However, the BaseHTTPServer is surprisingly hard to shut down properly, so that it gives up the socket and everything.

While working on gocept.selenium, we came up with some code that does the trick (together with Jan-Wijbrand Kolman and Jan-Jaap Driessen).

class HTTPServer(BaseHTTPServer.HTTPServer):

    _continue = True

    def serve_until_shutdown(self):
        while self._continue:
            self.handle_request()

    def shutdown(self):
        self._continue = False
        # We fire a last request at the server in order to take it out of the
        # while loop in `self.serve_until_shutdown`.
        try:
            urllib2.urlopen(
                'http://%s:%s/' % (self.server_name, self.server_port))
        except urllib2.URLError:
            # If the server is already shut down, we receive a socket error,
            # which we ignore.
            pass
        self.server_close()

You might use this in a zope.testrunner layer like this:

class SilentRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):

    def log_message(self, format, *args):
        pass


class HTTPServerLayer(object):

    host = 'localhost'

    def setUp(self):
        self.server = None
        self.port = random.randint(30000, 40000)
        self.start_server()

    def start_server(self):
        self.server = HTTPServer((self.host, self.port), SilentRequestHandler)
        self.server_thread = threading.Thread(
            target=self.server.serve_until_shutdown)
        self.server_thread.daemon = True
        self.server_thread.start()
        # Kludge: Wait a little as it sometimes takes a while to get the server
        # started.
        time.sleep(0.25)

    def stop_server(self):
        if self.server is None:
            return
        self.server.shutdown()
        self.server_thread.join()

    def tearDown(self):
        self.stop_server()

 

 

No luck with glusterfs

Recently, we’ve been experimenting with glusterfs as an alternative network storage backing our VM hosting. It looked like a very promising candidate to replace our current iSCSI stack: scale-out with decent performance, mostly self-configuring, self-replicating, self-healing. And all of this out-of-the-box without complex setup. In contrast, the conventional architecture with a complex layering of iSCSI targets, DRBD, and Linux-HA glued together with a pack of shell scripts looks rather 90’s.

We played with glusterfs for a while. Setting up and configuring the software went quite smooth compared to the traditional stuff. But after some stress testing in a replicated scenario, we found severe problems.

Synchronisation

On the storage, the virtual machines represent themselves basically as one big image file. This image can become several hundreds of Gigabytes big. This is OK as long as the replicated file servers are in sync. But once one goes offline and online again, the versions of the image may differ and the self-healing algorithm is triggered. Due to glusterfs’ architecture, this happens  entirely on the filesystem client (i.e., the KVM host). After re-connecting a file server, all VM I/O is to be paused until self-healing is complete. The live VM is stuck for some amount of time between several seconds and more than a minute. A considerable portion of our hosting cluster could freeze for minutes. This is cleary unacceptable. Re-connecting a previously disconnected file server would be a risky operation: quite the opposite of what replication is good for.

No global state

Another feature of glusterfs is that replication is handled entirely on the filesystem client and not on the server. This leads to an orthogonal and modular approach which has a lot of advantages. But it makes it hard to determine when a file server can be disconnected safely: Given that self-healing takes a considerable amount of time, we cannot be sure if there is still some self-heal operation in progress. But disconnecting a replicated file server which had the newer copy of a VM image before the other file server has caught up would render the VM unusable. Unfortunately, there seems to be no easy way to query a glusterfs file server for active self-healing operations. This makes disconnecting a file server a risky operation, too.

Good for its intended use

In summary, we learned that glusterfs’ architecture is a good fit for the use case it has originally been designed: a NFS replacement with lots of small files. But for our scenario where continuously running processes need to access a few large image files uninterruptedly, glusterfs seems not to be the best fit.

So we will stick to the good ol’ iSCSI stack for now. Perhaps Ceph or Sheepdog will become viable alternatives in the future once they stabilise.

rrdtool restore and merge from backup

We recently had an issue with our backup server which was also running Nagios including pnp4nagios to gather performance data.

We quickly started to deploy a new Nagios server which started gathering statistics again right away.

After pulling the historical RRD databases from the backup we discovered no easy way to integrate the both datasets. After fiddling with some tools we extended an existing script that can be used to integrate different RRD sources into a single file to match our use case.

The resulting script simply replaces all “null” data rows in the new database that the old database has data available for. A second script provides the ability to merge whole directory trees of RRDs.

The script can be found in the rrdmerge bitbucket repository.

How-To: Undo a transaction with the ZODB

Suppose you’ve written a script to “fix something real quick” and unleashed it upon your live database. Five minutes later, you discover your script had a bug, and now you’ve wrecked quite a bit of production data. Ouch.

You might be lucky, though, since the ZODB offers transaction-level undo. This comes with a lot of caveats, though, the biggest being that if something else was changed in the meantime that causes the undo to conflict, it won’t work. (Before transaction X, some value was A which X changed to B, but later something changes it to C. If I now want to undo transaction X to get back to A, it will conflict. Catalogs and other shared state are prime candidates).

But you still might be lucky and there won’t be a conflict. So, how do you undo a transaction? First, you need to find the transaction. In my case, I knew an object that had been changed by my script. So I asked the ZODB for the history of that object, i.e. the last transaction(s) that changed it:

>>> db = root._p_jar.db()
>>> hist = db.history(my_changed_object._p_oid)
>>> hist
[{'tid': '\x03...', 'size': 123, 'user_name': '', 'description': '', 'time': 1304493667.320477}]

Now I have the offending transaction’s ID. However, the undo() API does not work with transaction ids but needs a special (storage-specific) identifier. And, since as far as I can tell there is no way to map a transaction id to an “undo id”, I had to make to by matching the time stamp:

>>> info = db.undoInfo(specification=dict(time=hist['time']))
>>> info
[{'id': 'A44XmR876bs=', 'time': 1304493667.320477, 'user_name': '', 'description': '', 'size': 315893}]

Finally, call undo and hope you don’t get a conflict upon committing:

>>> db.undo(info['id'])
>>> import transaction
>>> transaction.commit()