Florian Pilz

Today’s blogpost is all about me. I joined gocept about a year ago, therefore I want to tell you something about the past – and something about the future.

The Past

Florian Pilz

Before joining the gocept crew, I studied computing at the HTWK in Leipzig for 6 years. Since sitting in a german university was not enough for me, I reached out to study in the UK for the best time of a year. This way I came to the University of Bolton near Manchester, where – most importantly – I learned to speak English fluently.

During my time in the UK I started to look into Ruby on Rails and earned my first salaries. This was the point were I switched my Linux PC for a Mac. Since I already was comfortable with Windows and Linux I wanted to try out the last big operation system out there. But as many before me, I liked the Mac so much that I still use it as my development machine.

Back in Germany I continued to earn some money on the side working with Ruby on Rails. I really enjoyed using Ruby and I think in terms of readability it’s ahead of most other programming languages by several magnitudes. But Ruby also has its shortcomings, which I did not recognize until I switched to Python as my main programming language here at gocept.

I also did some very interesting projects during my studies, here are just three of them:

  • An OCR software which recognizes mathematical symbols. The idea was to integrate it into the Leibniz-project to make it possible for blind people to read mathematical school books.
  • An iPhone app that helps developers and freelancers to get better at estimating how long it takes to implement a feature or project.
  • A scientific article about software estimation that was actually published in print!

After I finished my master thesis, I had the opportunity to go on a 2-month vacation with my wife. So we took two backpacks and travelled around the entire US. A memory I will feast on my entire life. You can get some impressions on my flickr fotostream.

The Future

During the last year at gocept, I dove into the code base of two of our biggest clients – DGB and Ver.di – and helped improving their CMS. Currently I am getting in touch with the code base of Zeit Online, since a colleague will move to the upper floor, where our admins are situated.

But most importantly, you will see me more often around here, since I will be writing about things we do and learn here at gocept during the following months. I already have some topics on my desk which are worth a blogpost. For example, why hg rebase can be harmful, why testing AJAX using Jasmine is a pain, the benefits of a headless browser like PhantomJS and many more.

Next up will be a post about something we did at the summer sprint, which includes an Arduino, an LED strip and a failing test server.

Posted in en | Tagged | Leave a comment

Sommerfest bei gocept – Samstag 20. September 2014 ab 16 Uhr

english version below

Kein Sommer geht zu Ende ohne ein Fest in unserem tollen Garten. Wie in den letzten Jahren laden wir unsere Familien, Freunde und Mitstreiter ein. Für leckeres Essen, kühle Getränke und Unterhaltung ist gesorgt – kommt und feiert mit uns – wir bitten um Voranmeldung.

WANN: Samstag, 20. September ab 16 Uhr

WO: Forsterstr. 29, 06112 Halle (Saale)

KANN es nicht erwarten: mail@gocept.com

Übrigens, auch dieses Jahr bildet die Party einen Schlusspunkt zu unserem DevOps Sprint. Hier könnt ihr mehr darüber lesen: gocept DevOps Sprint

 

gocept summer party– Saturday, September 20th  4 pm

No summer ending without a party at the gocept garden. Like in previous years we invite our families, friends, and “brothers-in-arms”. As usual we serve delicious food, chilled drinks, and amusement – so come and join us – R.s.v.p.

WHEN: Saturday, September 20th from 4 pm

WHERE: Forsterstr. 29, 06112 Halle (Saale)

CAN’T wait for it: mail@gocept.com

Incidentally, the party is also the closing event for our DevOps sprint.

Posted in de, en | Tagged | Leave a comment

haproxy load-balancing for PHP applications with sticky sessions

We like applications that are written with a shared-nothing approach: it greatly simplifies running multiple instances on multiple hosts and allows for simple, robust load-balancer configuration.

Recently, we had to deploy a PHP application that – in the last minute – turned out to use PHP sessions and thus required sticky sessions.

We haven’t used sticky sessions in a while and the amount of reading required to find the specific working setup was substantial, so we’ll repeat here what a post at networkinghowtos.com already figured out:

backend default
    appsession PHPSESSID len 64 timeout 3h request-learn prefix

As you can see there isn’t much magic to it – the haproxy manual has a good in-detail explanation of the appsession option. The biggest point of this option is that you do not have haproxy injecting another session identifier but simply piggybacks on the existing one that PHP determines. Also, this option combines nicely with “leastconn” balancing if your application only uses cookies on a few selected pages and many users do not trigger getting a session cookie.

Posted in Uncategorized | 2 Comments

September, 18th–20th: DevOps Sprint

Since we have a strong history in web development, but also were involved in operating web applications we developed, the DevOps movement hit our nerves.

Under the brand name “Flying Circus” we are establishing a platform respecting the DevOps principles.

A large portion of our day-to-day work is dedicated to DevOps related topics. We like to collaborate by sharing ideas and work on tools we all need to make operations and development of web applications a smooth experience. A guiding question: how can we improve the operability of web applications?

A large field of sprintable topics comes to our mind:

Logging

Enable web application developers to integrate logging mechanisms into their apps easily. By using modern tools like Logstash for collecting and analyzing of the data, operators are able to find causes performance or other problems efficiently.

Live-Debugging and Monitoring

Monitoring is a must when operation software. At least for some people (including ourselves), Nagios is not the best fit for DevOps teams.

Deploying

We always wanted to have reproducable automated deployments. Coming from the Zope world, started with zc.buildout, we developed our own deployment tool batou. More recently upcoming projects, such as ansible, and tools (more or less) bound to cloud services like heroku.

Backup

After using bacula for a while, we started to work on backy, which aims to work directly on volume files of virtual machines.

and more…

Join us to work on these things and help to make DevOps better! The sprint will take place at our office, Forsterstraße 29, Halle (Saale), Germany. On September, 20th we will have a great party in the evening.

If you want to attend, please sign up on http://www.meetup.com/DevOps-Sprint/events/191582682/.

 

Accomodation

For your stay in Halle, we can recommend the following Hotels: “City Hotel am Wasserturm”, “Dorint Hotel Charlottenhof”, “Dormero Hotel Rotes Ross”. For those on budget, there is the youth hostel Halle (http://halle.djh-sachsen-anhalt.de/). Everything is in walking distance from our office.

Posted in en, Uncategorized | Tagged , , , , , , | 1 Comment

Flying Circus at EuroPython 2014

If you’re attending EP14, be sure to visit our Flying Circus booth at BCC level A! We’re here to discuss web operations. Managed hosting is only as good as the people behind it. So just walk over, test us, ask any question related to web operations! Additionally, we have some demo VMs readily available so you can get hands-on experience with a walk-through from our developers.

Flying Circus boot at EP14

Posted in en | Tagged , | Leave a comment

Follow up actions after the filesystem corruption incident

On 2014-06-07, the Flying Circus experienced a quite unfortunate filesystem corruption incident. Most of the VMs have been cleaned up since then, but a few defective files are still around. In the following article, I’ll provide some background information on what types of corruption we saw, what you (as our customer) can expect from platform management to rectify the situation, and what everyone can do to check his/her own applications.

Observed types of filesystem corruption

The incident resulted in lost updates on the block layer. This means that some filesystem blocks were reverted to an older state. Depending on what kind of information has been saved in the affected blocks, this may lead to different effects:

  • files show old content, as a file update got lost;
  • files show random content, as updates to the file’s extent list got lost;
  • files have disappeared completely, as updates to their containing directory got lost.

On most VMs which experienced filesystem corruption, filesystem metadata has been rendered invalid as well. We were able to identify these VMs quickly and contacted the affected customers immediately. However, there are still some cases left where filesystem metadata has not been affected (so the automated checks did not find anything), but file contents has been affected. Generally, files that have been updated in the time range between 2014-06-02 and 2014-06-07 or live in directories that saw changes during that time are at risk.

These cases of corruption are impossible to detect via filesystem checks. To make sure that all VMs are in a reasonably good state, twofold action is necessary: First, we will check the OS and all managed components as part of our platform management. Second, we ask you to take a look at your applications to uncover previously hidden cases of filesystem corruption.

Platform-wide checks

After taking short-term action to ensure that we will not run into a similar problem again, we are currently in the process of performing a deep scan of all installed OS files and managed components. In particular, we are going to:

  • perform a consistency check on all files installed from OS packages;
  • perform integrity checks on all managed databases (PostgreSQL, LDAP, …);
  • reboot all VMs to ensure that there is no stale cached content.

Found inconsistencies will be repaired automatically if possible (e.g., OS files). As far as application data is concerned (e.g., databases), we will contact you to work out available options to restore consistency. VM reboots will take place during announced maintenance periods as usual.

Application-specific checks

It is not possible to perform an automated deep check of project files, as we do for OS files. Too much context knowledge is necessary to judge on what is ok and what looks suspicious. So we ask you to throw a critical glance at your applications yourself. Our experience so far shows that signs of filesystem corruption reveal themselves quite fast when one starts to look for the right signs.

Two areas that need to be checked are static project files, like application software and configuration, and application-specific databases.

Checking application installations

Our first and most important recommendation is to restart all applications and check for obvious signs of trouble. This is both easy and points to most problems right away.

Additionally, the applications’ log files should be inspected. Filesystem corruption causes sometimes “illogical” errors to show up in the log files. We recommend to look through the log files for unusual error messages.

If corrupted installation or configuration files are found, the best way out is usually to re-deploy affected applications. This is easy if the deployment is controlled via an automated tool like batou or zc.buildout. Restoring installation files from backup is also an option.

Checking application data stores

Some applications bring in their own data store, for example ZODB or SOLR. Procedures depend on the specific software, but we can give some general suggestions here:

  • Some data stores have their own integrity checking or even repair tools on board. For example, ZODB complains about inconsistencies during packing.
  • Some data stores have means to dump their contents to an external file. During dumping, all pieces of data will be traversed and inconsistencies are likely to be discovered.
  • Some data stores can easily be rebuilt from scratch, for example caches, indexes, or session stores.

Please contact our support if you discover inconsistencies and need help to recover.

Summary

We are sorry for all the trouble the filesystem corruption incident has caused. We care about customer data and will do our best to get VMs as clean as possible. With the guidelines mentioned above, it should be possible to uncover a good portion of the corrupted files that have not been identified yet.

Posted in en | Tagged , , | Leave a comment

Heartbleed bug and the Flying Circus

tl;dr: The Flying Circus is not affected by the Heartbleed bug

As reported by several media there is a serious bug in the OpenSSL library, widely known as the Heartbleed bug. The bug was introduced in the OpenSSL development tree on January 1st, 2012 and was finally released with OpenSSL version 1.0.1.

The Flying Circus platform makes use of the Gentoo Linux distribution. The OpenSSL version maintained for the Flying Circus is 1.0.0j, so it is not affected by the Heartbleed bug.

To be sure we are not affected by the bug, for example by possible backports from the Gentoo maintainer, we also audited the OpenSSL sources that are in use to not implement the vulnerable heartbeat function.

The Flying Circus is not affected by Heartbleed and there was no time in the past when we had rolled out a vulnerable version.

There is no need to replace your certificates or keys.

Posted in en | Leave a comment