Undo transactions by truncating ZODB Data.fs

truncate can be used to permanently set back a ZODB Data.fs to a certain point in transaction history.

Sometimes I break the Data.fs of my ZODB in a way that the Zope instance cannot start any more or I want to try again a migration. In such situations it is handy that writing to a Data.fs means extending the file at the end. So the unwanted transaction can be truncated. Normally I use the following steps to do so:

1. Install ZODB in a virtualenv

This is needed to get the script named fstail. If you are already using Python 3, call:
python3.7 -m venv v_zodb

If you are still on Python 2, call:
virtualenv-2.7 v_zodb

Caution: The Python major version (2 or 3) must match the version you are using for the Zope instance.

Install the script into the virtual environment using:
cd v_zodb
bin/pip install ZODB

2. Stop Zope and ZEO

Stop both Zope and (if used) the ZEO server. This is necessary for your cut to get noticed by client and server.

3. Find the position where to cut

Call fstail. With -n you are able to specify the number of transactions to be shown:
bin/fstail -n 20 path/to/Data.fs

fstail returns something like the following. Newer transaction are at the top: (These lines here are only some extracted from a longer output.)

2019-04-24 08:38:44.622984: hash=0b59c10e6eaa947b2ec0538e26d9b4f9128c03cb
user=' admin' description='/storage/58bdea07-666c-11e9-8a63-34363bceb816/edit' length=19180 offset=12296784 (+97)
2019-04-24 08:38:06.823673: hash=3a595fb50b913bad819f0d5bd8d152e06bc695d7
user=' admin' description='/portal/site/add-page-form' length=132830 offset=12121677 (+58)
2019-02-26 10:28:10.856626: hash=5b2b0fbc33b53875b7110f82b2fe1793245c590b
user=' admin' description='/index_html/pt_editAction' length=444 offset=11409587 (+54)

Using the provided information in user and description you should be able to find the transaction from which on newer transactions should be removed. You need the provided value after offset= to do the cut.

In my example above, if /portal/site/add-page-form is the faulty transaction, my cut point is 12121677.

4. 3, 2, 1 … cut

Caution: Every transaction after the cut point (including the one you took the offset from) will get removed by cutting.

Get a truncate tool of your choice. I am using here one of Folkert van Heusden which comes with MacPorts and claims to be command-line compatible with the (Free-)BSD version.

In my example I would call it using:
truncate -s 12121677 path/to/Data.fs

That’s all. Start ZEO and Zope again to be back in transaction history where you have cut.

References

Migrate a Zope ZODB Data.fs to Python 3

TL;DR Use zodbupdate.

Problem

A ZODB Data.fs which was created under Python 2 cannot be opened under Python 3. This is prevented by using a different magic code in the first bytes of the file. This is done on purpose because str has a different meaning for the two Python versions: Under Python 2 a str is a container for characters with an arbitrary encoding (aka bytes​). Python 3 knows str as a text datatype which was called unicode in Python 2. Trying to load a str object in Python 3 which actually contains binary data will fail. It has to be bytes, but bytes is an alias for str  in Python 2 which means Python 2 replaces bytes  with str making is impossible to give Python 3 the class it expects for binary data. A Python 2 str  with an arbitrary encoding will break, too.

Solution

The Data.fs has to be migrated: each str  which actually contains bytes has to be converted into a zodbpickle.binary object which deserialises as bytes under Python 3. The str objects actually containing text have to be decoded to unicode. There are currently two tools which claim that they are able to do such a migration:

  • zodb.py3migrate was already written at Berlin Strategic sprint in 2016, but it was never able to prove that it can do what it claims: At the time when it was written there was no Zope which could run on Python 3. Now as we have Zope 4 running on Python 3 it does not seem to do its conversion job quite well: I was able to migrate a toy database but had to catch an unpickling error.
  • zodbupdate was enriched by a Python 3 migration. A big thank you to Sylvain Viollon and the developers at Minddistrict! It has proven its claims! At the Zope 4 welcome sprint I was able to migrate a Data.fs created on Zope 2.13 running on Python 2 to Zope 4 running on Python 3.

Steps

  1. Migrate your Zope application to Zope 4. (zodbupdate  requires at least ZODB 4 which is not the default ZODB version of Zope 2.13) — For my toy database containing only a file object and an image this was no problem. Zope 4  is starting with such a database. It might show some broken objects because Zope no longer depends on some previous core packages like Products.Sessions. If your application needs those packages you should add them to your Zope environment.
  2. ​zodbupdate has to be installed into the Zope 4 environment so it can access the Python classes. (It has to read the pickles in the ZODB.)
  3. There needs to be an entry_point in setup.py for each package which contains persistent Python classes. The entry point has to be named "zodbupdate.decode" and needs to point to a dictionary mapping paths to  str attributes to a conversion (bytes resp. a specific encoding). For Details see the migration documentation of zodbupdate. I prepared a branch of Zope 4 which contains this configuration dictionary for OFS.Image and OFS.File, see zopefoundation/Zope#285.
  4. Run zodbupdate --pack --convert-py3 on the Data.fs using Python 2.
  5. Copy the Data.fs over to the Zope 4 instance running on Python 3. Data.fs.index will be discarded at the first start. (There is an error message telling that it cannot be read.)
  6. Enjoy the contents of the Data.fs running on Python 3.

Conclusion

It is possible (proven for a toy database) to migrate a Data.fs from Zope 2.13  (Python 2) to Zope 4 (Python 3).

zodbupdate is the way to go. Although it cannot do the migration completely autonomously the developers of Python packages can provide migration configuration in their packages which can be used in the migration step so the configuration has only to be written once.

zodb.py3migrate has an analysis step which shows the attribute names where the str objects are stored. (This could be added to zodbupdate, so do not expect that there will be two tools trying to achieve the same goal.)

mdtools.relstorage contains a relstorage variant of zodbupdate which claims to be much faster on relstorage as it can leverage parallelism.

Open issues

The pull request containing the migration strategy (zopefoundation/Zope#285) has to be extended for the other persistent classes in Zope. There have to be alike changes in all packages providing persistent classes.

zodb.py3migrate: Migrate an existing ZODB Data.fs to be used with Python 3

At Berlin Strategic sprint 2016 we developed a tool to analyze a ZODB Filestorage to find Python 2 string objects. If they are in an encoding besides ASCII this is preventing using this Filestorage with Python 3 because of decoding errors arising on loading the pickles.
The tool is even able to convert those strings either to unicode by decoding them using a configurable encoding or convert them to zodbpickle.binary so Python 3 will read them as bytes.
There is documentation of the tool and a repository on GitHub where the code lives.

There are still some questions open:

  • Is there already another tool for this analysis/migration?
  • Is there already any practical knowledge migrating Filestorage contents to Python 3?
  • Do you think such a tool is the right approach to achieve such a migration?
  • Is there anyone who wants to try out the tool on a Filestorage of a personal project and share the experiences? (We analyzed two projects where we have access to a Filestorage but we are sure this does not catch all the edge cases.)