August, 15th–17th: Sprinting on Pyramid

After Zope “-the-Framework” reaching the end of its lifecycle during the last few years, we did a bunch of new projects with Pyramid, a nice web framework primarily authored by long-term Zope developer Chris McDonough.

We think it’s about time to give something back to the community, and become more involved in Pyramid development. We therefore happily announce to host a large Pyramid sprint organised in cooperation with the pysprints.de team. You find more information and sprint topics at GitHub.

The sprint starts on Thu, August 15th 10:00h CEST, and ends on Sat, August 17th with a garden party in the evening! Expect BBQ, beer and (most likely) live music!

If you would like to attend, please sign up on lanyrd.

developer & admin BBQ IV

Our fourth BBQ (invitation post) had the most participants so far, almost 20 people were here to talk shop, exchange ideas and brave the unfortunately slightly rainy weather (the grilled goods were delicious regardless). We’re especially glad that the ratio of gocept people to guests was only about 50% this time, and we’re hoping it will go down further. 🙂

The sessions in the Open Space were about diverse subjects, ranging from “Deploying lots of Rasperry Pi’s” over “Gamification in a business context” to “Why is there no slim and simple CMS yet?”. In several sessions we didn’t find a satisfactory solution to the problem, but sometimes sharing your frustrations with others who have similar experiences is helpful in itself.

Since the session about code katas was very well liked, we’re thinking about maybe doing a Code Retreat instead of a classic Open Space for the next BBQ, so stay tuned.

Running tests using gocept.selenium on Travis-CI

Travis-CI is a free hosted continuous integration platform for the open source community. It has a good integration with Github, so each push to a project runs the tests  of the project.

gocept.selenium is a python package our company has developed as a test-friendly Python API for Selenium which allows to run tests in a browser.

Travis-CI uses YML-Files to configure the test run. I found only little documentation how to run Selenium tests on Travis-CI. But it is straight forward. The following YML file I took from a personal project of mine. (I simplified it a bit for this blog post.):

[code]
language: python
python:
– 2.6
before_install:
– "export DISPLAY=:99.0"
– "sh -e /etc/init.d/xvfb start"
– "wget http://selenium.googlecode.com/files/selenium-server-standalone-2.31.0.jar"
– "java -jar selenium-server-standalone-2.31.0.jar &"
– "export GOCEPT_SELENIUM_BROWSER=’*firefox’"
install:
– python bootstrap.py
– bin/buildout
script:
– bin/test
[/code]

Explanation:

  • Lines 1 – 4: My project is a Python project which currently only runs on Python 2.6. But other Python versions will work as well.
  • Lines 5, 6: Firefox needs a running XServer, so we start it first as it takes some seconds to launch. See Travis-CI documentation, too.
  • Lines  7, 8: The Selenium server seems not to be installed by default, so get it and launch it.
  • Line 9: Tell gocept.selenium to use Firefox to run the tests. (Note: To use the new Webdriver-API in the upcoming version 2 of gocept.selenium you have to set other environment variables.)
  • Lines 10 – 14: Install the project and run the tests as usual. (The example uses zc.buildout to do this.)

Note: Although I use the Firefox which is installed by default on the Travis-CI machine, I did not yet find out which version it is.

PyCon 2013 report

PyCon 2013 was an excellent conference bringing together Python’s vast, diverse, and technically excellent community. I had the opportunity to visit the whole conference including the sprint days.

Magnitude

The size of the community seems well reflected by the number of attendees that PyCon US attracts: the limit of 2,500 attendees was reached on 2013-02-02, about 1 month prior to the conference. This should be about 500 attendees more than in 2012 when they exceeded their planned capacity of 1,500 ending up with 2,000 IIRC.

It was very nice to see that the organization is growing along with the task: everything ran very smoothly, a lot of detailed changes over last year, some for better, some for worse (Remember: if you want to improve, you need to change, and the means you need to accept set backs to learn.)

Diversity

Yes, there was this FUBAR situation regarding a “code of conduct” violation. I think too many people who have not been at PyCon have contributed to the turmoil already so I’ll refrain from commenting.

I was happy to hear that PyLadies (and everybody else working on the diversity of the community) could see their efforts showing excellent results: around 20% of all participants were women (or girls). I had the impression on the first day of the conference that more women were around than usually on tech conferences.

But not only that, we also had:

  • a wide range of ages: from kids, to students, to way more senior people
  • very business and very relaxed, alternative people (Plone RV, anyone?)
  • visitors from all over the world

I had to ponder a bit why this actually makes me happy: the diversity shows me that what we do is important to everyone and does not need to be either obscure and geeky or shirt-and-tie business.

We can have a community where you can be geeky and nerdy, do business, and feel like a human being. How great is that? Conferences always tend to be very intense environments, somewhat “from outer space”. Combined with travelling overseas for almost two weeks, having a human environment just makes it so worthwhile and a bit more sustainable.

To everybody who did not have an absolutely great experience personally: I’m empathatic and I hope next PyCon will be better for you. A lot has been said about the code of conduct and the organizers definitely pay a lot of attention to it. Nevertheless: 2,500 people stuffed into a few rooms over almost a week will cause friction here and there. If I should encounter a similar situation myself I will hopefully be able to apply some of the experience as a bystander and: stay calm, be friendly, and help defusing situations.

Technical excellence

There isn’t much you can do to get more sophisticated technical people talking about programming into the same spot compared to PyCon. Maybe DEFCON, or USENIX, or other more orthogonally oriented spots. But for practicality this is just it.

I recommend you visit pyvideo.org and go through the recorded videos of all sessions. It’s always a good idea to listen to what Raymond Hettinger has to say. And Guido, of course.

Sprinting

I felt very productive during the sprints: I started out sitting in a room with Nate Aune, Jeff Forcier, and some others, talking about deployment things. I worked a bit on our deployment utility batou trying to soften some rough edges and gather feedback from others.

However, I also had the PyPI mirror client software on my radar. As we are operating one of the official mirrors (the F mirror) I was fed up by the constant breakage that the existing pep381client experienced everywhere. I sat down, refactored, and lo and behold! a bandersnatch appeared. This is a full rewrite that can be used with the existing mirror data and is much more reliable and – in the case of error – easier debug and recover.

Sponsoring

gocept also was a silver sponsor for PyCon. We already sponsored PyCon in 2012, but this year we:

  • did not insert more stuff into the attendee bag (it’s way too heavy already anyways)
  • did set up a booth to become approachable and get to talk to people

Our product (the Flying Circus) is in use for consulting clients but still on its way to become a product that you can just use by registering and providing payment details. Operations as a service is a very dynamic space today and we had some good opportunities to try to explain what we envision and where we think existing IaaS and PaaS models are aiming at the wrong thing. If you’re interested in this kind of thing, then visit our homepage and sign up to our newsletter and we’ll keep you updated.

PyCon has been a very sponsor-friendly place, especially for small businesses. It’s always a hassle to bring a lot of stuff half around the globe, but the environment was perfect to just bring some banner and flyers and talk to people strolling around.

2014

So next year, PyCon US will actually be PyCon North America, as the conferences moves to Montreal. Besides making this a much shorter trip I’m also looking forward to some new cultural impressions.

How we organize large-scale roll-outs

In the coming week we will deploy an extensive OS update to our production environment which (right now) currently consists of 41 physical hosts running 195 virtual machines.

Updates like this are prepared very carefully in many small steps using our development and staging setups that reflect the exactly same environment as our production systems in the data center.

Nevertheless, we learned to expect the unexpected when deploying to our production environment. This is why we established the one/few/many paradigm for large updates. The remainder of this post talks about our scheduling mechanism to determine which machines are updated at what point in time.

Automated maintenance scheduling

The Flying Circus configuration management database (CMDB) keeps track of times that are acceptable to each customer for scheduled maintenance. When a machine determines that a particular automated activity will be disruptive (e.g. because it makes the system temporarily unstable or reboots) then it requests a maintenance period from the CMDB based on the customers’ preferences and the estimated duration of the downtime Customers are then automatically notified what will happen at what time.

This alone is too little to make a large update that affects all machines (and thus all customers) but it’s the mechanical foundation of the next step.

Maintenance weeks

When we roll out a large update we rather add additional padding for errors and thus we invented the “maintenance week”. For this we can ask the CMDB to proactively schedule relatively large maintenance windows for all machines in a given pattern.

Here’s a short version of how this schedule is built when an administrator pushes the “Schedule maintenance week” button in our CMDB (all times in UTC):

  1. Monday 09:00 – automation management, monitoring, and binary package compilation get updated
  2. Monday 13:00 – the first router and one storage server are updated
  3. Monday 17:00 – internal test machines (our litmus machines) and a small but representative set of customer machines that are marked as test environments get updated
  4. Tuesday 17:00 – the remainder of customer test machines, up to 5% of untested production VMs, and 20% of the storage servers are updated
  5. Wednesday 17:00 – 30% of the production VMs get updated and 30% of the storage servers are updated
  6. Thursday 17:00 – the remaining production VMs and storage servers get updated
  7. Saturday 09:00 – KVM hosts are updated and rebooted
  8. Saturday 13:00 – the second router is updated

Once the schedule has been established, customers are informed by email about the assigned slots. An internal cross-check ensures that all machines in the affected location do have a window assigned for this week.

Maintenance week schedule

This procedure causes the number of machines that get updated rise from Monday (22 machines) to Thursday (about 100 machines). Any problems we find on Monday we can fix on a small number of machines and provide a bugfix to avoid the issue on later days completely.

However, if you read the list carefully you are probably asking yourself: Why are customer VMs without tests updated early? Doesn’t this force customers without tests to experience outages more heavily?

Yes. And in our opinion this is a good thing: First, in earlier phases we have smaller numbers of machines to deal with. Any breakage that occurs on Monday or Tuesday can be dealt with more timely than if unexpected breakage occurs on Wednesday or Thursday where many machines are updated at onces. Second, if your service is critical then you should feel the pain of not having tests (similar to pain that you experience if you don’t write unit tests and try to refactor). We believe that “herd immunity” will give you a false sense of security and rather have unexpected errors occur early and clearly visible so they can be approached with a good fix instead of hiding them as long as possible.

We’re looking forward to our updates next week. Obviously we’re preparing for unexpected surprises, but what will they have in stock for us this time?

We also appreciate feedback: How do you prepare updates for many machines? Is there anything we’re missing? Anything that sounds like a good idea to you? Let us know – leave a comment!

developer & admin BBQ IV

Zum vierten Mal wird am 30. April 2013 um 14:00 Uhr das „developer & admin BBQ“ stattfinden. Die Veranstaltung bietet Software-Entwicklern und Administratoren ein Forum um Ideen, Probleme und deren Lösungen auszutauschen. In einem an „Open Space“ angelehnten Format hat jeder Teilnehmer die Möglichkeit eigene Themen einzubringen, die dann in kleineren Runden bearbeitet werden können.

In der Vergangenheit wurden sowohl konkrete technische Problemstellungen, wie zum Beispiel „Plattformübergreifende Entwicklung für Mobilgeräte“ oder „Pymp your (vim|emacs) – sinnvolle Editor-Erweiterungen für Python-Entwicklung“ thematisiert. Aber auch theoretische Themen rund um agile Entwicklungsprozesse („Test Driven Development“) oder Anwendungsbetrieb („Deploying applications and the 12-factor app“)  wurden behandelt.

Wie bereits für die letzten Veranstaltungen gibt es auch dieses Mal eine Reihe von Themenvorschlägen:

  • Raspberry Pi – Möglichkeiten, Grenzen, Alternativen?
  • Übung macht den Meister: Code Katas.
  • Der „CMS-Zoo“ – Auf der Suche nach einem vernünftigen CMS.
  • Erst den Test, dann den Code – Test Driven Development
  • Ceph – performantes, stabiles und skalierbares verteiltes Dateisystem im Produktiveinsatz

Ab ca. 19 Uhr gibt es beim gemeinsamen Abendessen (je nach Wetterlage auch am Lagerfeuer mit Grill), die Möglichkeit sich in angenehmer Atmosphäre weiter auszutauschen und kennen zu lernen.

Wie bisher findet das BBQ wieder bei der Firma gocept gmbh & co. kg, Forsterstraße 29 in Halle statt.

Anmeldungen zur Teilnahme sowie Vorschläge für weitere Themen können hier eingetragen werden. Weiterhin ist die Veranstaltung bei meetup gelistet. Auch der Facebook-Event darf gerne zur Anmeldung benutzt werden.

P.S.: Since we are addressing local audience we are keeping this post in German. Basically we want developers and admins in our area to meet up, exchange ideas, and enjoy BBQ.

Overcoming Self-organization Blocks in Agile Teams

At the SQD2013 conference Andrea Provaglio (@andreaprovaglio) talked about the social aspects of software development. He says, software development is intangible, collaborative and heuristic while our education prepares us for linear, standardized and predictable processes. (Software development) teams are systems. Systems self-organize, if they can. (A plant is a system, and it self-organizes; it turns towards the light).

Self-organization (in a software development team) cannot be enforced, it can only be enabled. The team cannot be empowered to self-organize: Power can be given, power can be taken. The team needs the room to emancipate: emancipation cannot be taken away. Nevertheless self-organization needs leadership to align the team with the business goals and share existing constraints. Leading is different from commanding, though.

There are indicators for why self-organization doesn’t happen:

  • People build fences. They point out other’s mistakes, like to see other’s fail, take side against something or somebody, etc.
  • There is a blaming culture, mistakes are considered bad.
  • Judgement combined with superiority and and emotional quality to it.
  • Relationships between different persons are parent-child-like instead of adult-adult. If people are getting told what to do all the time, they can hardly self-organize.
  • People feel they are paddling against the river.

Watch out for those indicators.

Andrea redefines self-organization as Collective behavior that manifests itself when the individuals take personal responsibility in co-creating a shared future. I like that.

Software Quality Days 2013

SQD took place in Vienna from January 15 to 17 2013. My first impressions where: Scrum is everywhere and open source does not play a major role.

Most people I have met came from rather large organizations. They mostly face different problems than us: running one large project with multiple teams. We instead have one development team running multiple projects.

However, the topics at the conference where not too far off: we at gocept are building high quality software, too.

  • Andrea Provaglio (@andreaprovaglio) talked about Overcoming Self-organization Blocks in Agile Teams: why do we need self-organization, how can it be enabled and what can prevent it.
  • In contrast to Andrea Provaglio, who says that software was intangible, Tom Gilb (@imtomgilb) has a completely different opinion: since you can quantify anything (including software), it cannot be intangible. In his talk Quality Quantification for Quality Engineering he stated that all quality requirements need to be quantified (actually he states this for years now). When you actually what to verify if a certain quality level was reached (or not) you must quantify it.
  • Sander Hoogendoorn (@aahoogendoorn) talked about agile anti-patterns, like “Dogmagile”, “Crusader Agile”, “Scrumdementalism” and others. His views pretty much match mine: don’t be dogmatic, do what works.
  • Professional speaker Herman Scherer (@hermannscherer) held his talk Jenseits vom Mittelmaß. It is roughly about being competitive in today’s environment. Very inspiring.
  • On the bad side there was Rainer Stropek’s talk about Monitoring and Root Cause Analysis in Dynamic Cloud Infrastructures. While there was some truth in it, it was extremely high level and basically a road show for Microsoft Azure: #fail.

The conference itself was well organized. The call for papers for the SQD2014 is open until May, 31 2013. Maybe next time we can put in a breeze of small business and open-source.

News from the toolbox: gocept.selenium and our plans for its future

For a couple of years, we at gocept have been developing a Python library, gocept.selenium, whose goal it is to integrate testing web sites in real browsers with the Python unittest framework. There exist a number of approaches to doing this; when first starting real-browser tests, we opted for using selenium. Back then, it had not been integrated with webdriver yet (more on webdriver below).

There turned out to be multiple aspects to selenium integration: setting up the web server under test, starting a browser to run selenium and pointing it at the server, but also designing a wrapper around the selenium testing API to bring it in line with unittest’s way of defining specialised assertions.

We came up with the gocept.selenium package which includes both a selenese module defining such an API wrapper and a bunch of modules for integration with those web-server frameworks that we happen to use in our work, among them generic WSGI and a number of Zope-related servers. The integration mechanism is implemented in terms of test layers, so all of this requires the Zope test runner to be used. We released a 1.0 version of gocept.selenium in November 2012, marking the selenese API as stable.

The description of the package given so far already indicates two aspects that need yet to be addressed: Firstly, the selenium project is based on webdriver nowadays, with the old selenium implementation being kept for backwards compatibility at the moment. Secondly, collecting all those server integration modules in the same package that implements the actual selenium integration makes for rather complex (albeit optional) package dependencies and poses a maintainability problem.

We have dealt with the latter in December 2012, extracting all those integration modules from gocept.selenium into a new package, gocept.httpserverlayer. From the package’s documentation:
»This package provides an HTTP server for testing your application with normal HTTP clients (e.g. a real browser). This is done using test layers, which are a feature of zope.testrunner. gocept.httpserverlayer uses plone.testing for the test layer implementation, and exposes the following resources (accessible in your test case as self.layer[RESOURCE_NAME]):

  • http_host: The hostname of the HTTP server (Default: localhost)
  • http_port: The port of the HTTP server (Default: 0, which means chosen automatically by the operating system)
  • http_address: hostname:port, convenient to use in URLs (e.g. ‘http://user:password@%s/path’ % self.layer[‘http_address’])

In addition to generic WSGI and static-file serving, the server frameworks supported at this point (i.e. gocept.httpserverlayer 1.0.1) include Zope3/ZTK (both using zope.app.testing and zope.app.wsgi with the latter supporting Grok) as well as Zope2 and Plone (using ZopeTestCase, WSGI or plone.testing.z2).

After the creation of gocept.httpserverlayer, we released the 1.1 series of gocept.selenium which no longer brings its own integration code. For the sake of backwards compatibility, though, it still implements separate TestCase classes for each of the integration flavours.

This leaves webdriver support to be dealt with. Originally, we had hoped to simply sneak it in, having to change very little client code, if any at all. Our plan was to implement the old API (both for test setup and selenese) in terms of webdriver which should allow us to benefit from webdriver immediately, as some issues with the old selenium were causing trouble in our daily work (including the behaviour of type and typeKeys as well as drag-and-drop). We started a branch of gocept.selenium where we switched from integrating legacy selenium to talking to webdriver and changed the selenese implementation to use webdriver commands.

However, it turned out that a number of details couldn’t be completely hidden, and webdriver brought its own share of problems (including, sadly, new issues with drag-and-drop). We tried out our branch in a real project to the point that all tests would pass again, and ended up with a long list of upgrade notes describing incompatibilities, either temporary or not, both causing semantic differences of behaviour and necessitating changes to the test code. We identified a number of pieces of the old selenese API that we wouldn’t bother implementing, and we still had a few large projects that would help discover more things to watch out for.

It became clear that sneaking webdriver into an existing selenium test suite wasn’t the way to get to use it soon. So, instead of continuing to develop the branch and replacing the selenium-based implementation in gocept.selenium 2, we merged the branch now, in such a way that we have two different selenium integrations available at the same time, usable simultaneously in the same project. That way, new browser tests can be added using the webdriver integration layer, and existing tests can be migrated to using webdriver test case by test case, as needed.

We have made alpha releases of gocept.selenium 2 so people may experiment with the webdriver integration. Note that while the current implementation of the test layer (gocept.selenium.webdriver.Layer) contains some code to deal with Firefox, we have successfully run it against Chrome as well. While the integration layer exposes a raw webdriver object as the seleniumrc resource, there is also the WebdriverSeleneseLayer which offers a resource named selenium, which is the old selenese API implemented in terms of webdriver and can be used together with the base layer.

We are currently working towards a stable gocept.selenium 2 release that includes webdriver support at the level described, but at the same time also thinking about how our ideal testing API might be structured in order to integrate with the unittest API concepts but make better use of the object-oriented raw webdriver API than the current selenese does. If you have an interest in using webdriver in conjunction with the Python unittest framework you are very welcome to try out the current state of gocept.selenium 2 and get back to us with ideas and suggestions.

Happy new year – cleaning up the server room!

Welcome to 2013!

Alex and I are using this time of the year when most of our colleagues are still on holidays to perform maintenance on our office infrastructure.

To prepare for all the goodness we have planned for the Flying Circus in 2013 we decided to upgrade our internet connectivity (switching from painful consumer-grade DSL/SDSL connections to fibre, yeah!)  and also clean up our act in our small private server room. For that we decided to buy a better UPS and PDUs, a new rack to get some space between the servers and clean up the wiring.

Yesterday we prepared the parts we can do ourselves in preparation of the electricians coming in on Friday to start installing that nice Eaton 9355 8kVA UPS.

So, while the office was almost empty two of us managed to use our experience with the data center setups we do to turn a single rack  (pictures of which we’re too ashamed to post) into this:

 

Although the office was almost abandoned, those servers to serve a real purpose and we had to be careful to avoid too massive interruptions as they do handle:

  • our phone system and office door bell
  • secondary DNS for our infrastructure and customer domains
  • chat and support systems
  • monitoring with business-critical alerting

Here’s how we did it:

  • Power down all the components that are not production-related and move them from the existing rack (right one on the front picture) to the new one. For that we already had our rack logically split between “infrastructure development” and “office business” machines.
  • Move the development components (1 switch, 7 servers, 1 UPS) to the new rack. Wire everything again (nicely!) and power it up. Use the power-up cycle to verify that IPMI remote control works. Also notice which machines don’t boot cleanly (which we only found on machines that are under development regarding kernels anyway, yay).
  • Notice that the old UPS isn’t able to actually run all those servers’ load and keep one turned off until we get the new UPS installed.
  • Now that we had space in the existing rack we re-distributed the servers there as well to make the arrangement more logical (routers, switches, and other telco-related stuff at the top). Turn off single servers one-by-one and keep everyone in the office informed about short outages.
  • Install new PDUs in the space we got after removing superfluous cables. Get lots of scratches while taking stuff out and in.
  • Update our inventory databases, take pictures, write blog post. 🙂

As the existing setup was quite old and grew over time we were pretty happy to be able to apply the lessons we learned in those years in between and get everything cleaned up in less than 10 hours. We notice the following things that we did differently this time (and have been doing so in the data center for a while already):

  • Create bundles of network cables for one server (we use 4) and put them  in a systematic pattern into the switch, label them once with the servername at each end. Colors indicate VLAN.
  • Use real PDUs both for IEC and Schuko equipment. Avoid consumer-grade power-distribution.
  • Leave a rack unit between each component to allow operating without hurting yourself, the flexibility to pass wires (KVM) to the front, and to avoid temperature peaks within the rack.
  • Having over-capacity makes it easier to keep things clean which in turn makes you more flexible and brings ease to your mind to focus on the important stuff.

As the pictures indicate we aren’t completely done installing all the shiny new things, so here’s what’s left for the next days and weeks:

  • Wait for the electricians and Eaton to install and activate our new UPS.
  • Wire up the new PDUs with the new UPS and clean up the power wiring for each server.
  • Wait for the telco to finish digging and putting fibre into the street and get their equipment installed so we can enjoy a “real” internet connection.

All in all, we had a very productive and happy first working day in 2013. If this pace keeps up then we should encounter the singularity sometime in April.