Migrate a Zope ZODB Data.fs to Python 3

TL;DR Use zodbupdate.

Problem

A ZODB Data.fs which was created under Python 2 cannot be opened under Python 3. This is prevented by using a different magic code in the first bytes of the file. This is done on purpose because str has a different meaning for the two Python versions: Under Python 2 a str is a container for characters with an arbitrary encoding (aka bytes​). Python 3 knows str as a text datatype which was called unicode in Python 2. Trying to load a str object in Python 3 which actually contains binary data will fail. It has to be bytes, but bytes is an alias for str  in Python 2 which means Python 2 replaces bytes  with str making is impossible to give Python 3 the class it expects for binary data. A Python 2 str  with an arbitrary encoding will break, too.

Solution

The Data.fs has to be migrated: each str  which actually contains bytes has to be converted into a zodbpickle.binary object which deserialises as bytes under Python 3. The str objects actually containing text have to be decoded to unicode. There are currently two tools which claim that they are able to do such a migration:

  • zodb.py3migrate was already written at Berlin Strategic sprint in 2016, but it was never able to prove that it can do what it claims: At the time when it was written there was no Zope which could run on Python 3. Now as we have Zope 4 running on Python 3 it does not seem to do its conversion job quite well: I was able to migrate a toy database but had to catch an unpickling error.
  • zodbupdate was enriched by a Python 3 migration. A big thank you to Sylvain Viollon and the developers at Minddistrict! It has proven its claims! At the Zope 4 welcome sprint I was able to migrate a Data.fs created on Zope 2.13 running on Python 2 to Zope 4 running on Python 3.

Steps

  1. Migrate your Zope application to Zope 4. (zodbupdate  requires at least ZODB 4 which is not the default ZODB version of Zope 2.13) — For my toy database containing only a file object and an image this was no problem. Zope 4  is starting with such a database. It might show some broken objects because Zope no longer depends on some previous core packages like Products.Sessions. If your application needs those packages you should add them to your Zope environment.
  2. ​zodbupdate has to be installed into the Zope 4 environment so it can access the Python classes. (It has to read the pickles in the ZODB.)
  3. There needs to be an entry_point in setup.py for each package which contains persistent Python classes. The entry point has to be named "zodbupdate.decode" and needs to point to a dictionary mapping paths to  str attributes to a conversion (bytes resp. a specific encoding). For Details see the migration documentation of zodbupdate. I prepared a branch of Zope 4 which contains this configuration dictionary for OFS.Image and OFS.File, see zopefoundation/Zope#285.
  4. Run zodbupdate --pack --convert-py3 on the Data.fs using Python 2.
  5. Copy the Data.fs over to the Zope 4 instance running on Python 3. Data.fs.index will be discarded at the first start. (There is an error message telling that it cannot be read.)
  6. Enjoy the contents of the Data.fs running on Python 3.

Conclusion

It is possible (proven for a toy database) to migrate a Data.fs from Zope 2.13  (Python 2) to Zope 4 (Python 3).

zodbupdate is the way to go. Although it cannot do the migration completely autonomously the developers of Python packages can provide migration configuration in their packages which can be used in the migration step so the configuration has only to be written once.

zodb.py3migrate has an analysis step which shows the attribute names where the str objects are stored. (This could be added to zodbupdate, so do not expect that there will be two tools trying to achieve the same goal.)

mdtools.relstorage contains a relstorage variant of zodbupdate which claims to be much faster on relstorage as it can leverage parallelism.

Open issues

The pull request containing the migration strategy (zopefoundation/Zope#285) has to be extended for the other persistent classes in Zope. There have to be alike changes in all packages providing persistent classes.

zodb.py3migrate: Migrate an existing ZODB Data.fs to be used with Python 3

At Berlin Strategic sprint 2016 we developed a tool to analyze a ZODB Filestorage to find Python 2 string objects. If they are in an encoding besides ASCII this is preventing using this Filestorage with Python 3 because of decoding errors arising on loading the pickles.
The tool is even able to convert those strings either to unicode by decoding them using a configurable encoding or convert them to zodbpickle.binary so Python 3 will read them as bytes.
There is documentation of the tool and a repository on GitHub where the code lives.

There are still some questions open:

  • Is there already another tool for this analysis/migration?
  • Is there already any practical knowledge migrating Filestorage contents to Python 3?
  • Do you think such a tool is the right approach to achieve such a migration?
  • Is there anyone who wants to try out the tool on a Filestorage of a personal project and share the experiences? (We analyzed two projects where we have access to a Filestorage but we are sure this does not catch all the edge cases.)