developer & admin BBQ IV

Our fourth BBQ (invitation post) had the most participants so far, almost 20 people were here to talk shop, exchange ideas and brave the unfortunately slightly rainy weather (the grilled goods were delicious regardless). We’re especially glad that the ratio of gocept people to guests was only about 50% this time, and we’re hoping it will go down further. :-)

The sessions in the Open Space were about diverse subjects, ranging from “Deploying lots of Rasperry Pi’s” over “Gamification in a business context” to “Why is there no slim and simple CMS yet?”. In several sessions we didn’t find a satisfactory solution to the problem, but sometimes sharing your frustrations with others who have similar experiences is helpful in itself.

Since the session about code katas was very well liked, we’re thinking about maybe doing a Code Retreat instead of a classic Open Space for the next BBQ, so stay tuned.

Posted in Uncategorized | Leave a comment

Running tests using gocept.selenium on Travis-CI

Travis-CI is a free hosted continuous integration platform for the open source community. It has a good integration with Github, so each push to a project runs the tests  of the project.

gocept.selenium is a python package our company has developed as a test-friendly Python API for Selenium which allows to run tests in a browser.

Travis-CI uses YML-Files to configure the test run. I found only little documentation how to run Selenium tests on Travis-CI. But it is straight forward. The following YML file I took from a personal project of mine. (I simplified it a bit for this blog post.):

language: python
python:
  - 2.6
before_install:
  - "export DISPLAY=:99.0"
  - "sh -e /etc/init.d/xvfb start"
  - "wget http://selenium.googlecode.com/files/selenium-server-standalone-2.31.0.jar"
  - "java -jar selenium-server-standalone-2.31.0.jar &"
  - "export GOCEPT_SELENIUM_BROWSER='*firefox'"
install:
  - python bootstrap.py
  - bin/buildout
script:
  - bin/test

Explanation:

  • Lines 1 – 4: My project is a Python project which currently only runs on Python 2.6. But other Python versions will work as well.
  • Lines 5, 6: Firefox needs a running XServer, so we start it first as it takes some seconds to launch. See Travis-CI documentation, too.
  • Lines  7, 8: The Selenium server seems not to be installed by default, so get it and launch it.
  • Line 9: Tell gocept.selenium to use Firefox to run the tests. (Note: To use the new Webdriver-API in the upcoming version 2 of gocept.selenium you have to set other environment variables.)
  • Lines 10 – 14: Install the project and run the tests as usual. (The example uses zc.buildout to do this.)

Note: Although I use the Firefox which is installed by default on the Travis-CI machine, I did not yet find out which version it is.

Posted in en | Tagged , , , , , , , , , | 2 Comments

PyCon 2013 report

PyCon 2013 was an excellent conference bringing together Python’s vast, diverse, and technically excellent community. I had the opportunity to visit the whole conference including the sprint days.

Magnitude

The size of the community seems well reflected by the number of attendees that PyCon US attracts: the limit of 2,500 attendees was reached on 2013-02-02, about 1 month prior to the conference. This should be about 500 attendees more than in 2012 when they exceeded their planned capacity of 1,500 ending up with 2,000 IIRC.

It was very nice to see that the organization is growing along with the task: everything ran very smoothly, a lot of detailed changes over last year, some for better, some for worse (Remember: if you want to improve, you need to change, and the means you need to accept set backs to learn.)

Diversity

Yes, there was this FUBAR situation regarding a “code of conduct” violation. I think too many people who have not been at PyCon have contributed to the turmoil already so I’ll refrain from commenting.

I was happy to hear that PyLadies (and everybody else working on the diversity of the community) could see their efforts showing excellent results: around 20% of all participants were women (or girls). I had the impression on the first day of the conference that more women were around than usually on tech conferences.

But not only that, we also had:

  • a wide range of ages: from kids, to students, to way more senior people
  • very business and very relaxed, alternative people (Plone RV, anyone?)
  • visitors from all over the world

I had to ponder a bit why this actually makes me happy: the diversity shows me that what we do is important to everyone and does not need to be either obscure and geeky or shirt-and-tie business.

We can have a community where you can be geeky and nerdy, do business, and feel like a human being. How great is that? Conferences always tend to be very intense environments, somewhat “from outer space”. Combined with travelling overseas for almost two weeks, having a human environment just makes it so worthwhile and a bit more sustainable.

To everybody who did not have an absolutely great experience personally: I’m empathatic and I hope next PyCon will be better for you. A lot has been said about the code of conduct and the organizers definitely pay a lot of attention to it. Nevertheless: 2,500 people stuffed into a few rooms over almost a week will cause friction here and there. If I should encounter a similar situation myself I will hopefully be able to apply some of the experience as a bystander and: stay calm, be friendly, and help defusing situations.

Technical excellence

There isn’t much you can do to get more sophisticated technical people talking about programming into the same spot compared to PyCon. Maybe DEFCON, or USENIX, or other more orthogonally oriented spots. But for practicality this is just it.

I recommend you visit pyvideo.org and go through the recorded videos of all sessions. It’s always a good idea to listen to what Raymond Hettinger has to say. And Guido, of course.

Sprinting

I felt very productive during the sprints: I started out sitting in a room with Nate Aune, Jeff Forcier, and some others, talking about deployment things. I worked a bit on our deployment utility batou trying to soften some rough edges and gather feedback from others.

However, I also had the PyPI mirror client software on my radar. As we are operating one of the official mirrors (the F mirror) I was fed up by the constant breakage that the existing pep381client experienced everywhere. I sat down, refactored, and lo and behold! a bandersnatch appeared. This is a full rewrite that can be used with the existing mirror data and is much more reliable and – in the case of error – easier debug and recover.

Sponsoring

gocept also was a silver sponsor for PyCon. We already sponsored PyCon in 2012, but this year we:

  • did not insert more stuff into the attendee bag (it’s way too heavy already anyways)
  • did set up a booth to become approachable and get to talk to people

Our product (the Flying Circus) is in use for consulting clients but still on its way to become a product that you can just use by registering and providing payment details. Operations as a service is a very dynamic space today and we had some good opportunities to try to explain what we envision and where we think existing IaaS and PaaS models are aiming at the wrong thing. If you’re interested in this kind of thing, then visit our homepage and sign up to our newsletter and we’ll keep you updated.

PyCon has been a very sponsor-friendly place, especially for small businesses. It’s always a hassle to bring a lot of stuff half around the globe, but the environment was perfect to just bring some banner and flyers and talk to people strolling around.

2014

So next year, PyCon US will actually be PyCon North America, as the conferences moves to Montreal. Besides making this a much shorter trip I’m also looking forward to some new cultural impressions.

Posted in en, Uncategorized | Leave a comment

How we organize large-scale roll-outs

In the coming week we will deploy an extensive OS update to our production environment which (right now) currently consists of 41 physical hosts running 195 virtual machines.

Updates like this are prepared very carefully in many small steps using our development and staging setups that reflect the exactly same environment as our production systems in the data center.

Nevertheless, we learned to expect the unexpected when deploying to our production environment. This is why we established the one/few/many paradigm for large updates. The remainder of this post talks about our scheduling mechanism to determine which machines are updated at what point in time.

Automated maintenance scheduling

The Flying Circus configuration management database (CMDB) keeps track of times that are acceptable to each customer for scheduled maintenance. When a machine determines that a particular automated activity will be disruptive (e.g. because it makes the system temporarily unstable or reboots) then it requests a maintenance period from the CMDB based on the customers’ preferences and the estimated duration of the downtime Customers are then automatically notified what will happen at what time.

This alone is too little to make a large update that affects all machines (and thus all customers) but it’s the mechanical foundation of the next step.

Maintenance weeks

When we roll out a large update we rather add additional padding for errors and thus we invented the “maintenance week”. For this we can ask the CMDB to proactively schedule relatively large maintenance windows for all machines in a given pattern.

Here’s a short version of how this schedule is built when an administrator pushes the “Schedule maintenance week” button in our CMDB (all times in UTC):

  1. Monday 09:00 – automation management, monitoring, and binary package compilation get updated
  2. Monday 13:00 – the first router and one storage server are updated
  3. Monday 17:00 – internal test machines (our litmus machines) and a small but representative set of customer machines that are marked as test environments get updated
  4. Tuesday 17:00 – the remainder of customer test machines, up to 5% of untested production VMs, and 20% of the storage servers are updated
  5. Wednesday 17:00 – 30% of the production VMs get updated and 30% of the storage servers are updated
  6. Thursday 17:00 – the remaining production VMs and storage servers get updated
  7. Saturday 09:00 – KVM hosts are updated and rebooted
  8. Saturday 13:00 – the second router is updated

Once the schedule has been established, customers are informed by email about the assigned slots. An internal cross-check ensures that all machines in the affected location do have a window assigned for this week.

Maintenance week schedule

This procedure causes the number of machines that get updated rise from Monday (22 machines) to Thursday (about 100 machines). Any problems we find on Monday we can fix on a small number of machines and provide a bugfix to avoid the issue on later days completely.

However, if you read the list carefully you are probably asking yourself: Why are customer VMs without tests updated early? Doesn’t this force customers without tests to experience outages more heavily?

Yes. And in our opinion this is a good thing: First, in earlier phases we have smaller numbers of machines to deal with. Any breakage that occurs on Monday or Tuesday can be dealt with more timely than if unexpected breakage occurs on Wednesday or Thursday where many machines are updated at onces. Second, if your service is critical then you should feel the pain of not having tests (similar to pain that you experience if you don’t write unit tests and try to refactor). We believe that “herd immunity” will give you a false sense of security and rather have unexpected errors occur early and clearly visible so they can be approached with a good fix instead of hiding them as long as possible.

We’re looking forward to our updates next week. Obviously we’re preparing for unexpected surprises, but what will they have in stock for us this time?

We also appreciate feedback: How do you prepare updates for many machines? Is there anything we’re missing? Anything that sounds like a good idea to you? Let us know – leave a comment!

Posted in en, Uncategorized | Leave a comment

developer & admin BBQ IV

Zum vierten Mal wird am 30. April 2013 um 14:00 Uhr das „developer & admin BBQ“ stattfinden. Die Veranstaltung bietet Software-Entwicklern und Administratoren ein Forum um Ideen, Probleme und deren Lösungen auszutauschen. In einem an „Open Space“ angelehnten Format hat jeder Teilnehmer die Möglichkeit eigene Themen einzubringen, die dann in kleineren Runden bearbeitet werden können.

In der Vergangenheit wurden sowohl konkrete technische Problemstellungen, wie zum Beispiel „Plattformübergreifende Entwicklung für Mobilgeräte“ oder „Pymp your (vim|emacs) – sinnvolle Editor-Erweiterungen für Python-Entwicklung“ thematisiert. Aber auch theoretische Themen rund um agile Entwicklungsprozesse („Test Driven Development“) oder Anwendungsbetrieb („Deploying applications and the 12-factor app“)  wurden behandelt.

Wie bereits für die letzten Veranstaltungen gibt es auch dieses Mal eine Reihe von Themenvorschlägen:

  • Raspberry Pi – Möglichkeiten, Grenzen, Alternativen?
  • Übung macht den Meister: Code Katas.
  • Der „CMS-Zoo“ – Auf der Suche nach einem vernünftigen CMS.
  • Erst den Test, dann den Code – Test Driven Development
  • Ceph – performantes, stabiles und skalierbares verteiltes Dateisystem im Produktiveinsatz

Ab ca. 19 Uhr gibt es beim gemeinsamen Abendessen (je nach Wetterlage auch am Lagerfeuer mit Grill), die Möglichkeit sich in angenehmer Atmosphäre weiter auszutauschen und kennen zu lernen.

Wie bisher findet das BBQ wieder bei der Firma gocept gmbh & co. kg, Forsterstraße 29 in Halle statt.

Anmeldungen zur Teilnahme sowie Vorschläge für weitere Themen können hier eingetragen werden. Weiterhin ist die Veranstaltung bei meetup gelistet. Auch der Facebook-Event darf gerne zur Anmeldung benutzt werden.

P.S.: Since we are addressing local audience we are keeping this post in German. Basically we want developers and admins in our area to meet up, exchange ideas, and enjoy BBQ.

Posted in de | Tagged , , | Leave a comment

Overcoming Self-organization Blocks in Agile Teams

At the SQD2013 conference Andrea Provaglio (@andreaprovaglio) talked about the social aspects of software development. He says, software development is intangible, collaborative and heuristic while our education prepares us for linear, standardized and predictable processes. (Software development) teams are systems. Systems self-organize, if they can. (A plant is a system, and it self-organizes; it turns towards the light).

Self-organization (in a software development team) cannot be enforced, it can only be enabled. The team cannot be empowered to self-organize: Power can be given, power can be taken. The team needs the room to emancipate: emancipation cannot be taken away. Nevertheless self-organization needs leadership to align the team with the business goals and share existing constraints. Leading is different from commanding, though.

There are indicators for why self-organization doesn’t happen:

  • People build fences. They point out other’s mistakes, like to see other’s fail, take side against something or somebody, etc.
  • There is a blaming culture, mistakes are considered bad.
  • Judgement combined with superiority and and emotional quality to it.
  • Relationships between different persons are parent-child-like instead of adult-adult. If people are getting told what to do all the time, they can hardly self-organize.
  • People feel they are paddling against the river.

Watch out for those indicators.

Andrea redefines self-organization as Collective behavior that manifests itself when the individuals take personal responsibility in co-creating a shared future. I like that.

Posted in en | Tagged , | Leave a comment

Software Quality Days 2013

SQD took place in Vienna from January 15 to 17 2013. My first impressions where: Scrum is everywhere and open source does not play a major role.

Most people I have met came from rather large organizations. They mostly face different problems than us: running one large project with multiple teams. We instead have one development team running multiple projects.

However, the topics at the conference where not too far off: we at gocept are building high quality software, too.

  • Andrea Provaglio (@andreaprovaglio) talked about Overcoming Self-organization Blocks in Agile Teams: why do we need self-organization, how can it be enabled and what can prevent it.
  • In contrast to Andrea Provaglio, who says that software was intangible, Tom Gilb (@imtomgilb) has a completely different opinion: since you can quantify anything (including software), it cannot be intangible. In his talk Quality Quantification for Quality Engineering he stated that all quality requirements need to be quantified (actually he states this for years now). When you actually what to verify if a certain quality level was reached (or not) you must quantify it.
  • Sander Hoogendoorn (@aahoogendoorn) talked about agile anti-patterns, like “Dogmagile”, “Crusader Agile”, “Scrumdementalism” and others. His views pretty much match mine: don’t be dogmatic, do what works.
  • Professional speaker Herman Scherer (@hermannscherer) held his talk Jenseits vom Mittelmaß. It is roughly about being competitive in today’s environment. Very inspiring.
  • On the bad side there was Rainer Stropek’s talk about Monitoring and Root Cause Analysis in Dynamic Cloud Infrastructures. While there was some truth in it, it was extremely high level and basically a road show for Microsoft Azure: #fail.

The conference itself was well organized. The call for papers for the SQD2014 is open until May, 31 2013. Maybe next time we can put in a breeze of small business and open-source.

Posted in en | Tagged | 1 Comment