Zope Resurrection Part 1 – Reanimation

Now we are helping Zope in the Python 3 wonderland: Almost 20 people started with the reanimation of Zope. We are working mostly on porting important dependencies of Zope to Python 3:

Zope 4 is now per default based on WSGI. Thanks to Hanno who invested much time to make the WSGI story of Zope much more streamlined.

We found out that the ZTK (Zope Toolkit) is no longer used in any of the projects using packages of it (Zope, Grok, Bluebream – as it is dead). It should be kept to test compatibility between the packages inside the ZTK.

Last minute information for remote Sprinters for the Zope Resurrection Sprint

As the Zope Resurrection sprint is approaching, it seems useful to share some information on the schedule for the three days in Halle. As we have also some sprinters, who can not join on site, but might want to join remotely, a few key facts might come in handy.

Etherpad

There is the Etherpad, where the current list of topics is collected. Most of the stories we are going to tackle will be based on this list. In case you have something to add, or are interested in a specific topic in particular, it is a good idea to add your thoughts before the start of the sprint.

IRC Channel

During the sprint we will communicate via the #sprint channel on irc.freenode.net, so additional information and questions can be placed there.

Google Hangout

As the sprint is also intended to foster discussions about the future of Zope, we want to encourage as many people as possible to join. Therefore, we have a hangout where we will meet.

Scheduled Discussions

At the moment we plan the following session:

  • Wednesday 2016-09-28 10:00 CEST, duration 1 h: Introduction to WSGI in Zope 4 by Hanno Schlichting (hannosh) + discussion
  • Wednesday 2016-09-28 14:00 CEST, duration 2 h: Discussion about through-the web (TTW) pattern and the underlying RestrictedPython
  • Thursday 2016-09-29 10:00 CEST, duration 0.5 h: Discussion about the necessity and future of ZTK
  • Friday 2016-09-30 14:00 CEST, duration 1 h: Discussion about the Zope 5

So in case you want to contribute remotely to the sprint, please join us on one of the three ways.

 

Zope in the Python 3 wonderland

A little tale

Once upon the time there was the big mighty Zope II. It was one of the leaders in the Python land. It had mighty features like TTW (trough the web) development and its own object oriented database. Many people liked Zope II and trusted it to be the basis for their personal work and the work of their companies.

After some years Zope II got a son named Zope III. It could have been even mightier and more trustworthy than Zope II. But some people argued that Zope III is too complex and too hard to understand and way too different from Zope II which they knew very well.

New, faster and easier to understand frameworks arose beside Zope II and Zope III and the people began to like the new ones more. After many years of power, Zope became weak. Its name got hated by many people in its country. It even got nearly forgotten.

But there are still the people out in the far away kingdoms who built a complete ecosystem on the shoulders of Zope. They cannot completely change the basis of their work within months or even years. They deserve a perspective even for the Python 3 wonderland which is no longer a vague dream but already a future with wide open doors.

Back in reality

We are here to help Zope to find a new home in the Python 3 land and live there happily a great many years. Therefore we scheduled a Sprint named Zope Resurrection Sprint where we will discuss and code on Zope to get it more Python 3 ready.

The big blocker is currently RestrictedPython which relies on modules which are no longer in the Python 3 standard library. But RestrictedPython is an important part of the TTW security concept, so it has to be rewritten from scratch. Folks who think they do not need TTW in their projects can work on a Zope where RestrictedPython is optional.

Besides the Python 3 migration of the code the contents of the application databases using Zope need a conversion, too. On a previous sprint in Berlin, we wrote a possible solution: zodb.py3migrate which needs some more tests with real databases.

Furthermore, there is Zope IV, the successor of Zope II which needs some love to become strong and ready for its adventurous journey as the basis of Plone VI and others.

It is still possible to join the sprint either in Halle (Saale), Germany or remote. We are looking forward to a good time to put up the banner of Zope in the Python land again.

zodb.py3migrate: Migrate an existing ZODB Data.fs to be used with Python 3

At Berlin Strategic sprint 2016 we developed a tool to analyze a ZODB Filestorage to find Python 2 string objects. If they are in an encoding besides ASCII this is preventing using this Filestorage with Python 3 because of decoding errors arising on loading the pickles.
The tool is even able to convert those strings either to unicode by decoding them using a configurable encoding or convert them to zodbpickle.binary so Python 3 will read them as bytes.
There is documentation of the tool and a repository on GitHub where the code lives.

There are still some questions open:

  • Is there already another tool for this analysis/migration?
  • Is there already any practical knowledge migrating Filestorage contents to Python 3?
  • Do you think such a tool is the right approach to achieve such a migration?
  • Is there anyone who wants to try out the tool on a Filestorage of a personal project and share the experiences? (We analyzed two projects where we have access to a Filestorage but we are sure this does not catch all the edge cases.)

September, 18th–20th: DevOps Sprint

Since we have a strong history in web development, but also were involved in operating web applications we developed, the DevOps movement hit our nerves.

Under the brand name “Flying Circus” we are establishing a platform respecting the DevOps principles.

A large portion of our day-to-day work is dedicated to DevOps related topics. We like to collaborate by sharing ideas and work on tools we all need to make operations and development of web applications a smooth experience. A guiding question: how can we improve the operability of web applications?

A large field of sprintable topics comes to our mind:

Logging

Enable web application developers to integrate logging mechanisms into their apps easily. By using modern tools like Logstash for collecting and analyzing of the data, operators are able to find causes performance or other problems efficiently.

Live-Debugging and Monitoring

Monitoring is a must when operation software. At least for some people (including ourselves), Nagios is not the best fit for DevOps teams.

Deploying

We always wanted to have reproducable automated deployments. Coming from the Zope world, started with zc.buildout, we developed our own deployment tool batou. More recently upcoming projects, such as ansible, and tools (more or less) bound to cloud services like heroku.

Backup

After using bacula for a while, we started to work on backy, which aims to work directly on volume files of virtual machines.

and more…

Join us to work on these things and help to make DevOps better! The sprint will take place at our office, Forsterstraße 29, Halle (Saale), Germany. On September, 20th we will have a great party in the evening.

If you want to attend, please sign up on http://www.meetup.com/DevOps-Sprint/events/191582682/.

 

Accomodation

For your stay in Halle, we can recommend the following Hotels: “City Hotel am Wasserturm”, “Dorint Hotel Charlottenhof”, “Dormero Hotel Rotes Ross”. For those on budget, there is the youth hostel Halle (http://halle.djh-sachsen-anhalt.de/). Everything is in walking distance from our office.

Run tests using layers with py.test

TL;DR

Long Story

We have many test suites which use test layers (e. g. the ones from plone.testing). We want to use py.test and all its fancy features to have a modern test runner. There was no way to convert such tests partly: either you have to port the whole project or you are stuck with the zope.testrunner.

On our Pyramid-Sprint Godefroid Chapelle, Thomas Lotze and me wrote a package which wraps layers as py.test fixtures. The result is gocept.pytestlayer.

Implementation

For each layer it creates two fixtures: one for the layer setUp/tearDown and one for the testSetUp/testTearDown. The layer fixture is configured for class scope but the plug-in orders the tests and knows about the next test so the layer is only torn down if the next test needs another fixture.

Usage

You only have to add a new section to your package buildout and running the test via

bin/py.test -x

detects the layers and displays the needed setup code. See the PyPI-Page of the package for details.

Future

Maybe it is possible to get rid of the fixture setup code, so running tests using layers gets even easier.

Viewing scales metrics from Pyramid

We’ve recently started experimenting with the excellent scales library to collect in-process metrics (see Coda Hale’s CodeConf talk “Metrics everywhere” among many others for reasons why one definitely wants to do that).

Scales comes with a flask-based HTTP server that allows viewing the collected measurements and dumping them as JSON. But if you already are in a web application, there’s no real need to spin up yet another thread, open another port etc. to do this. In our case, we’re using Pyramid, so here’s a quick recipe to get the same view that greplin.scales.flaskhandler provides:

Update 2013-11-06: This code is now released as pyramid_scales.

# in your Pyramid setup
config.add_route('scales', '/scales/*prefix')

from StringIO import StringIO
from pyramid.view import view_config
import greplin.scales
import greplin.scales.formats

@view_config(route_name='scales', renderer='string')
def scales_stats(request):
    parts = request.matchdict.get('prefix')
    path = '/'.join(parts)
    stats = greplin.scales.util.lookup(greplin.scales.getStats(), parts)

    output = StringIO()
    outputFormat = request.params.get('format', 'html')
    query = request.params.get('query', None)
    if outputFormat == 'json':
        request.response.content_type = 'application/json'
        greplin.scales.formats.jsonFormat(output, stats, query)
    elif outputFormat == 'prettyjson':
        request.response.content_type = 'application/json'
        greplin.scales.formats.jsonFormat(output, stats, query, pretty=True)
    else:
        request.response.content_type = 'text/html'
        # XXX Dear pyramid.renderers.string_renderer_factory,
        # you can't be serious
        request.response.default_content_type = 'not-text/html'
        output.write('<html>')
        greplin.scales.formats.htmlHeader(output, '/' + path, __name__, query)
        greplin.scales.formats.htmlFormat(output, tuple(parts), stats, query)
        output.write('</html>')

    return output.getvalue()

Reproducable automated deployments on RaspberryPi with batou

For continuous integration during development, we use Jenkins to automatically run tests for all projects we maintain. Some time ago we wanted to increase visibility of the results, so we set up a Raspberry Pi driving a few meters of LPD8806-based LED strip on which we can address single LEDs to represent the status of individual or aggregated builds.

Automating deployments is a good idea…

After an SD Card failure we were painfully remembered how hard it can be to set up a service where all parts were deployed manually. Fortunately we wrote at least some minimal documentation on how to set everything up, so after a few days we were presented with many broken builds. Of course nobody cared about the build status with all LEDs being dark. 🙁

Let’s automate!

Today we wondered if we can use our deployment-tool batou to make reproduceable deployments to a raspberry pi, and did some tests on a vanilla raspbian image (2013-07-26 “Wheezy”).

Preparing your Raspberry Pi

Of course, you can not deploy to it without some simple preparations. First thing is, batou needs to be able to log on the target host with a public ssh key, so we copied our public key to the raspi which has the address 192.168.0.5 in this example:

local> ssh-copy-id pi@192.168.0.5
pi@192.168.0.5's password:
Type password of user pi, default: "raspberry"

(If you don’t have the ssh-copy-id, you have to manually append your ssh public key to /home/pi/.ssh/authorized_keys, which you will need to create on a plain installation)

Manually install minimal requirements

Batou does also have a few requirements which are needed to bootstrap the environment:

  • mercurial – to pull the buildout which sets up batou
  • python-virtualenv – to create a clean python environment for the buildout
  • python-dev – to compile libcrypto against

Note: We are currently working on batou 1.0 which most likely will no longer need any of these.

You can install all the requirements at once with the following command on your raspi:

pi> sudo aptitude install mercurial python-virtualenv python-dev

Prepare batou

Now you are ready to do your first batou deployment to a raspberry pi. For our experiments we created a small hello-world batou deployment, containing a test component which deploys a file /tmp/test which contains “foo” to a raspberry pi specified by an IP address.

To begin, clone the repository on your local machine:

local> hg clone https://bitbucket.org/gocept/batou-on-raspberrypi
local> cd batou-on-raspberrypi

Now, edit environments/pi.cfg and set the IP-address of your Raspi.

To create the nessecary scripts to do the deployment, run buildout to create a sandbox containing all dependencies of batou and the scripts you can use to deploy:

local> python bootstrap.py
…
local> bin/buildout

Deploy!

After some minutes, your batou deployment sandbox will be ready for use. You most likely modified environments/pi.cfg so you need to check in that change first, because batou refuses to deploy a dirty working copy.

local> hg ci -m 'change ip of my raspi'

To run the deployment, call batou-remote with the name of the environment (“pi”, which corresponds to environments/pi.cfg). Because the ssh user you use to connect with the target host differs from your local user, you have to specify it with --ssh-user.

local> bin/batou-remote pi --ssh-user=pi

Batou will now set up itself on the remote side and deploys all components specified in pi.cfg. To show it worked, check if the deployed file contains the correct content:

pi> cat /tmp/test
foo

Further readings

To learn more about batou, check http://batou.readthedocs.org.

If you want deploy your real life mission critical python applications into a fully automated environment using batou, head over to the Flying Circus.

TL;DR

  • Create reproducable automated deployments for your software is great fun.
  • Preparing a raspi to be a target host for batou 0.2.12 based deployments is easy:
    • Install python-virtualenv, mercurial and python-dev.
    • Put your ssh public key on the raspi.
  • Example deployment can be found on bitbucket.

Reliable file updates with Python

Programs need to update files. Although most programmers know that unexpected things can happen while performing I/O, I often see code that has been written in a surprisingly naïve way. In this article, I would like to share some insights on how to improve I/O reliability in Python code.

Consider the following Python snippet. Some operation is performed on data coming from and going back into a file:

with open(filename) as f:
   input = f.read()
output = do_something(input)
with open(filename, 'w') as f:
   f.write(output)

Pretty simple? Probably not as simple as it looks at the first glance. I often debug applications that show strange behaviour on production servers. Here are examples of failure modes I have seen:

  • A run away server process spills out huge amounts of logs and the disk fills up. write() raises an exception right after truncating the file, leaving the file empty.
  • Several instances of our application happen to run in parallel. After they have finished, the file contents is garbage because it intermingles output from multiple instances.
  • The application triggers some follow-up action after completing the write. Seconds later, the power goes off. After we have restarted the server, we see the old file contents again. The data already passed to other applications does not correspond to what we see in the file anymore.

Nothing of what follows is really new. My goal is to present common approaches and techniques to Python developers who are less experienced in system programming. I will provide code examples to make it easy for developers to incorporate these approaches into their own code.

What does “reliability” mean anyway?

In the broadest sense, reliability means that an operation is performing its required function under all stated conditions. With regard to file updates, the function in question is to create, replace or extend the contents of a file. It might be rewarding to seek inspiration from database theory here. The ACID properties of the classic transaction model will serve as guidelines to improve reliability.

To get started, let’s see how the initial example can be rated against the four ACID properties:

  • Atomicity requires that a transaction either succeeds or fails completely. In the example shown above, a full disk will likely result in a partially written file. Additionally, if other programs read the file while it is being written, they get a half-finished version even in the absence of write errors.
  • Consistency denotes that updates must bring the system from one valid state to another. Consistency can be subdivided into internal and external consistency: Internal consistency means that the file’s data structures are consistent. External consistency means that the file’s contents is aligned with other data related to it. In this example, it is hard to reason about consistency since we don’t know enough about the application. But since consistency requires atomicity, we can say at least that internal consistency is not guaranteed.
  • Isolation is violated if running transactions concurrently yields different results from running the same transactions sequentially. It is clear that the code above has no protection against lost updates or other isolation failures.
  • Durability means that changes need to be permanent. Before we signal success to the user, we must be sure that our data hits non-volatile storage and not just a write cache. Perhaps the code above has been written with the assumption in mind that disk I/O takes place immediately when we call write(). This assumption is not warranted by POSIX semantics.

Use a database system if you can

If we would be able to gain all four ACID properties, we would have come a long way towards increased reliability. But this requires significant coding effort. Why reinvent the wheel? Most database systems already have ACID transactions.

Reliable data storage is a solved problem. If you need reliable storage, use a database. Chances are high that you will not do it by yourself as good as those who have been working on it for years if not decades. If you do not want to set up a “big” database server, you can use sqlite for example. It has ACID transactions, it’s small, it’s free, and it’s included in Python’s standard library.

The article could finish here. But there are valid reasons not to use a database. They are often tied to file format or file location constraints. Both are not easily controllable with database systems. Reasons include:

  • we must process files generated by other applications, which are in a fixed format or at a fixed location
  • we must write files for consumption by other applications (and the same restrictions apply)
  • our files must be human-readable or human-editable

…and so on. You get the point.

If we are set out to implement reliable file updates on our own, there are some programming techniques to consider. In the following, I will present four common patterns of performing file updates. After that, I will discuss what steps can be taken to establish ACID properties with each file update pattern.

File update patterns

Files can be updated in a multitude of ways, but I see at least four common patterns. These will serve as a basis for the rest of this article.

Truncate-Write

This is probably the most basic pattern. In the following example, hypothetical domain model code reads data, performs some computation, and re-opens the existing file in write mode:

with open(filename, 'r') as f:
   model.read(f)
model.process()
with open(filename, 'w') as f:
   model.write(f)

A variant of this pattern opens the file in read-write mode (the “plus” modes in Python), seeks to the start, issues an explicit truncate() call and rewrites the contents:

with open(filename, 'a+') as f:
   f.seek(0)
   model.input(f.read())
   model.compute()
   f.seek(0)
   f.truncate()
   f.write(model.output())

An advantage of this variant is that we open file only once and keep it open all the time. This simplifies locking for example.

Write-Replace

Another widely used pattern is to write new contents into a temporary file and replace the original file after that:

with tempfile.NamedTemporaryFile(
      'w', dir=os.path.dirname(filename), delete=False) as tf:
   tf.write(model.output())
   tempname = tf.name
os.rename(tempname, filename)

This method is more robust against errors than the truncate-write method. See below for a discussion of atomicity and consistency properties. It is used by many applications.

These first two patterns are so common that the ext4 filesystem in the Linux kernel even detects them and fixes some reliability shortcomings automatically. But don’t depend on it: you are not always using ext4, and the administrator might have disabled this feature.

Append

The third pattern is to append new data to an existing file:

with open(filename, 'a') as f:
   f.write(model.output())

This pattern is used for writing log files and other cumulative data processing tasks. Technically, its outstanding feature is its extreme simplicity. An interesting extension is to perform append-only updates during regular operation and to reorganize the file into a more compact form periodically.

Spooldir

Here we treat a directory as logical data store and create a new uniquely named file for each record:

with open(unique_filename(), 'w') as f:
   f.write(model.output())

This pattern shares its cumulative nature with the append pattern. A big advantage is that we can put a little amount of metadata into the file name. This can be used, for example, to convey information about the processing status. A particular clever implementation of the spooldir pattern is the maildir format. Maildirs use a naming scheme with additional subdirectories to perform update operations in a reliable and lock-free way. The md and gocept.filestore libraries provide convenient wrappers for maildir operations.

If your file name generation is not guaranteed to give unique results, there is even a possibility to demand that the file must be actually new. Use the low-level os.open() call with proper flags:

fd = os.open(filename, os.O_WRONLY | os.O_CREAT| os.O_EXCL, 0o666)
with os.fdopen(fd, 'w') as f:
   f.write(...)

After opening the file with O_EXCL, we use os.fdopen to convert the raw file descriptor into a regular Python file object.

Applying ACID properties to file updates

In the following, I will try to enhance the file update patterns. Let’s see what we can do to meet each ACID property in turn. I will keep this as simple as possible, since we are not planning to write a complete database system. Please note that the material presented in this section is not exhaustive, but it may give you a good starting point for your own experimentation.

Atomicity

The write-replace pattern gives you atomicity for free since the underlying os.rename() function is atomic. This means that at any given point in time, any process sees either the old or the new file. This pattern has a natural robustness against write errors: if the write operation triggers an exception, the rename operation is never performed and thus, we are not in the danger of overwriting a good old file with a damaged new one.

The append patterns is not atomic by itself, because we risk to append incomplete records. But there is a trick to make updates appear atomic: Annotate each written record with a checksum. When reading the log later on, discard all records that do not have a valid checksum. This way, only complete records will be processed. In the following example, an application makes periodic measurements and appends a one-line JSON record each time to a log. We compute a CRC32 checksum of the record’s byte representation and append it to the same line:

with open(logfile, 'ab') as f:
    for i in range(3):
        measure = {'timestamp': time.time(), 'value': random.random()}
        record = json.dumps(measure).encode()
        checksum = '{:8x}'.format(zlib.crc32(record)).encode()
        f.write(record + b' ' + checksum + b'\n')

This example code simulates the measurements by creating a random value every second.

$ cat log
{"timestamp": 1373396987.258189, "value": 0.9360123151217828} 9495b87a
{"timestamp": 1373396987.25825, "value": 0.40429005476999424} 149afc22
{"timestamp": 1373396987.258291, "value": 0.232021160265939} d229d937

To process the log file, we read one record per line, split off the checksum, and compare it to the read record:

with open(logfile, 'rb') as f:
    for line in f:
        record, checksum = line.strip().rsplit(b' ', 1)
        if checksum.decode() == '{:8x}'.format(zlib.crc32(record)):
            print('read measure: {}'.format(json.loads(record.decode())))
        else:
            print('checksum error for record {}'.format(record))

Now we simulate a truncated write by chopping the last line:

$ cat log
{"timestamp": 1373396987.258189, "value": 0.9360123151217828} 9495b87a
{"timestamp": 1373396987.25825, "value": 0.40429005476999424} 149afc22
{"timestamp": 1373396987.258291, "value": 0.23202

When the log is read, the last incomplete line is rejected:

$ read_checksummed_log.py log
read measure: {'timestamp': 1373396987.258189, 'value': 0.9360123151217828}
read measure: {'timestamp': 1373396987.25825, 'value': 0.40429005476999424}
checksum error for record b'{"timestamp": 1373396987.258291, "value":'

The checksummed log record approach is used by a large number of applications including many database systems.

Individual files in the spooldir can likewise feature a checksum in each file. Another, probably easier, approach is to borrow from the write-replace pattern: first write the file aside and move it to its final location afterwards. Devise a naming scheme that protects work-in-progress files from being processed by consumers. In the following example, all file names ending with .tmp are ignored by readers and are thus safe to use during write operations:

newfile = generate_id()
with open(newfile + '.tmp', 'w') as f:
   f.write(model.output())
os.rename(newfile + '.tmp', newfile)

At last, truncate-write is non-atomic. I am sorry that I am not able to offer you an atomic variant. Right after performing the truncate operation, the file is nulled and no new content has been written yet. If a concurrent program reads the file now or, worse yet, an exception occurs and our program gets aborted, we see neither the old nor the new version.

Consistency

Most things I have said about atomicity can be applied to consistency as well. In fact, atomic updates are a prerequisite for internal consistency. External consistency means to update several files in sync. As this cannot easily be done, lock files can be used to ensure that read and write access do not interfere. Consider a directory where files need to be consistent with each other. A common pattern is to designate a lock file, which controls access for the whole directory.

Example writer code:

with open(os.path.join(dirname, '.lock'), 'a+') as lockfile:
   fcntl.flock(lockfile, fcntl.LOCK_EX)
   model.update(dirname)

Example reader code:

with open(os.path.join(dirname, '.lock'), 'a+') as lockfile:
   fcntl.flock(lockfile, fcntl.LOCK_SH)
   model.readall(dirname)

This method only works if we have control over all readers. Since there may be only one writer active at a time (the exclusive lock is blocking all shared locks), the scalability of this method is limited.

To take it one step further, we can apply the write-replace pattern to whole directories. This involves creating a new directory for each update generation and changing a symlink once the update is complete. For example, a mirroring application maintains a directory of tarballs together with an index file, which lists file name, file size, and a checksum. When the upstream mirror gets updated, it is not enough to implement an atomic file update for every tarball and the index file in isolation. Instead, we need to flip both the tarballs and the index file at the same time to avoid checksum mismatches. To solve this problem, we maintain a subdirectory for each generation and symlink the active generation:

mirror
|-- 483
|   |-- a.tgz
|   |-- b.tgz
|   `-- index.json
|-- 484
|   |-- a.tgz
|   |-- b.tgz
|   |-- c.tgz
|   `-- index.json
`-- current -> 483

Here, the new generation 484 is in the process of being updated. When all tarballs are present and the index file is up to date, we can switch the current symlink with a single, atomic os.symlink() call. Other applications see always either the complete old or the complete new generation. It is important that readers need to os.chdir() into the current directory and refer to files without their full path names. Otherwise, there is a race condition when a reader first opens current/index.json and then opens current/a.tgz, but in the meanwhile the symlink target has been changed.

Isolation

Isolation means that concurrent updates to the same file are serializable — there exists a serial schedule that gives the same results as the parallel schedule actually performed. “Real” database systems use advanced techniques like MVCC to maintain serializability while allowing for a great degree of parallelism. Back on our own, we better use locks to serialize file updates.

Locking truncate-write updates is easy. Just acquire an exclusive lock prior to all file operations. The following example code reads an integer from a file, increments it, and updates the file:

def update():
   with open(filename, 'r+') as f:
      fcntl.flock(f, fcntl.LOCK_EX)
      n = int(f.read())
      n += 1
      f.seek(0)
      f.truncate()
      f.write('{}\n'.format(n))

Locking updates using the write-replace pattern can be tricky. Using a lock the same way as in truncate-write can lead to updates conflicts. A naïve implementation could look like this:

def update():
   with open(filename) as f:
      fcntl.flock(f, fcntl.LOCK_EX)
      n = int(f.read())
      n += 1
      with tempfile.NamedTemporaryFile(
            'w', dir=os.path.dirname(filename), delete=False) as tf:
         tf.write('{}\n'.format(n))
         tempname = tf.name
      os.rename(tempname, filename)

What is wrong with this code? Imagine two processes compete to update a file. The first process just goes ahead, but the second process is blocked in the fcntl.flock() call. When the first process replaces the file and releases the lock, the already open file descriptor in the second process now points to a “ghost” file (not reachable by any path name) with old contents. To avoid this conflict, we must check that our open file is still the same after returning from fcntl.flock(). So I have written a new LockedOpen context manager to replace the built-in open context. It ensures that we actually open the right file:

class LockedOpen(object):

    def __init__(self, filename, *args, **kwargs):
        self.filename = filename
        self.open_args = args
        self.open_kwargs = kwargs
        self.fileobj = None

    def __enter__(self):
        f = open(self.filename, *self.open_args, **self.open_kwargs)
        while True:
            fcntl.flock(f, fcntl.LOCK_EX)
            fnew = open(self.filename, *self.open_args, **self.open_kwargs)
            if os.path.sameopenfile(f.fileno(), fnew.fileno()):
                fnew.close()
                break
            else:
                f.close()
                f = fnew
        self.fileobj = f
        return f

    def __exit__(self, _exc_type, _exc_value, _traceback):
        self.fileobj.close()
    def update(self):
        with LockedOpen(filename, 'r+') as f:
            n = int(f.read())
            n += 1
            with tempfile.NamedTemporaryFile(
                    'w', dir=os.path.dirname(filename), delete=False) as tf:
                tf.write('{}\n'.format(n))
                tempname = tf.name
            os.rename(tempname, filename)

Locking append updates is as easy as locking truncate-write updates: acquire an exclusive lock, append, done. Long-running processes, which leave a file permanently open, may need to release locks between updates to let others in.

The spooldir pattern has the elegant property that it does not require any locking. Again, it depends on using a clever naming scheme and a robust unique file name generation. The maildir specification is a good example for a spooldir design. It can be easily adapted to other cases, which have nothing to do with mail.

Durability

Durability is a bit special because it depends not only on the application, but also on OS and hardware configuration. In theory, we can assume that os.fsync() or os.fdatasync() calls do not return until data has reached permanent storage. In practice, we may run into several problems: we may be facing incomplete fsync implementations or awkward disk controller configurations, which never give any persistence guarantee. A talk from a MySQL dev goes into great detail of what can go wrong. Some database systems like PostgreSQL even offer a choice of persistence mechanisms so that the administrator can select the best suited one at runtime. The poor man’s option although is to just use os.fsync() and hope that it has been implemented correctly.

With the truncate-write pattern, we have to issue an fsync after finishing write operations but before closing the file. Note that there is usually another level of write caching involved. The glibc buffer holds back writes inside the process even before they are passed to the kernel. To get the glibc buffer empty as well, we have to flush() it before fsync’ing:

with open(filename, 'w') as f:
   model.write(f)
   f.flush()
   os.fdatasync(f)

Alternatively, you can invoke Python with the -u flag to get unbuffered writes for all file I/O.

I prefer os.fdatasync() over os.fsync() most of the time to avoid synchronous metadata updates (ownership, size, mtime, …). Metadata updates can result in seeky disk I/O, which slows things down quite a bit.

Applying the same trick to write-replace style updates is only half of the story. We make sure that the newly written file has been pushed to non-volatile storage before replacing the old file, but what about the replace operation itself? We have no guarantee that the directory update is performed right on. There are lengthy discussions on how to sync a directory update on the net, but in our case (old and new file are in the same directory) we can get away with this rather simple solution:

os.rename(tempname, filename)
dirfd = os.open(os.path.dirname(filename), os.O_DIRECTORY)
os.fsync(dirfd)
os.close(dirfd)

We open the directory with the low-level os.open() call (Python’s built-in open() does not support opening directories) and perform a os.fsync() on the directory’s file descriptor.

Persisting append updates is again quite similar to what I have said about truncate-write.

The spooldir pattern has the same directory sync problems as the write-replace pattern. Fortunately, the same solution applies here as well: first sync the file, then sync the directory.

Conclusion

It is possible to update files reliably. I have shown that all four ACID properties can be met. The code examples presented above may serve as a toolbox. Pick the programming techniques that match your needs best. At times, you don’t need all four ACID properties but only one or two. I hope that this article helps you to make an informed decision about what to implement and what to leave out.

August, 15th–17th: Sprinting on Pyramid

After Zope “-the-Framework” reaching the end of its lifecycle during the last few years, we did a bunch of new projects with Pyramid, a nice web framework primarily authored by long-term Zope developer Chris McDonough.

We think it’s about time to give something back to the community, and become more involved in Pyramid development. We therefore happily announce to host a large Pyramid sprint organised in cooperation with the pysprints.de team. You find more information and sprint topics at GitHub.

The sprint starts on Thu, August 15th 10:00h CEST, and ends on Sat, August 17th with a garden party in the evening! Expect BBQ, beer and (most likely) live music!

If you would like to attend, please sign up on lanyrd.