{"id":171,"date":"2012-05-22T16:03:25","date_gmt":"2012-05-22T14:03:25","guid":{"rendered":"http:\/\/blog.gocept.com\/?p=171"},"modified":"2012-12-03T15:07:38","modified_gmt":"2012-12-03T14:07:38","slug":"dont-stop-postgresqls-autovacuum-with-your-application","status":"publish","type":"post","link":"https:\/\/blog.gocept.com\/2012\/05\/22\/dont-stop-postgresqls-autovacuum-with-your-application\/","title":{"rendered":"Don’t stop PostgreSQL’s autovacuum with your application"},"content":{"rendered":"
\n
\n

The problem<\/h2>\n

Some weeks ago, we received a complaint from a customer about bad PostgreSQL performance for a specific application. I took a look into the database and found strange things going on: the query planner was executing “interesting” query plans, tables were bloated with lots of dead rows (one was 6 times as big as it should be), and so on.<\/p>\n

The cause revealed itself when looking at pg_stat_user_tables<\/em>:<\/p>\n

abc-hans=# SELECT relname, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze\r\nFROM pg_stat_user_tables;\r\n        relname        | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze\r\n-----------------------+-------------+-----------------+--------------+------------------\r\n object_ref            |             |                 |              |\r\n archived_current      |             |                 |              |\r\n archived_item         |             |                 |              |\r\n pgtextindex           |             |                 |              |\r\n archived_container    |             |                 |              |\r\n archived_chunk        |             |                 |              |\r\n commit_lock           |             |                 |              |\r\n archived_blob_info    |             |                 |              |\r\n archived_state        |             |                 |              |\r\n archived_blob_link    |             |                 |              |\r\n archived_item_deleted |             |                 |              |\r\n pack_object           |             |                 |              |\r\n archived_class        |             |                 |              |\r\n archived_object       |             |                 |              |\r\n object_state          |             |                 |              |\r\n object_refs_added     |             |                 |              |\r\n blob_chunk            |             |                 |              |<\/pre>\n

Despite of heavy write activity on the database, no table had ever seen autovacuum or autoanalyze. But why?<\/p>\n

As I delved into it, I noticed that PostgreSQL’s autovacuum\/autoanalyze was practically stopped in two ways by the application. I’d like to share our findings to help other programmers not to get trapped in situations like this.<\/p>\n<\/div>\n

\n

Unfinished transactions<\/h2>\n

It turned out that the application had one component which connected to the database and opened a transaction right after startup, but never finished that transaction:<\/p>\n

abc-hans=# SELECT procpid, current_timestamp - xact_start AS xact_runtime, current_query\r\nFROM pg_stat_activity ORDER BY xact_start;\r\n procpid |  xact_runtime   | current_query\r\n\r\n---------+-----------------+-----------------------\r\n   18915 | 11:46:20.8783   | <IDLE> in transaction\r\n   21289 | 11:18:20.07042  | <IDLE> in transaction<\/pre>\n

Note that the database server was started about 11 \u00be hours ago in this example. Vacuuming (whether automatic or manual) stops at the oldest transaction id that is still in use. Otherwise it would be vacuuming active transactions, which is not sensible at all. In our example, vacuuming is stopped right away since the oldest running transaction is only one minute older than the running server instance. At least this is easy to resolve: we got the developers to fix the application. Now it finishes every transaction in a sensible amount of time with either COMMIT<\/em> or ABORT<\/em>.<\/p>\n<\/div>\n

\n

Exclusive table locks<\/h2>\n

Unfortunately, this was not all of it: autovacuum was working now but quite sporadically. A little bit of research revealed that autovacuum will abort if it is not able to obtain a table lock within one second<\/a>\u00a0– and guess what:\u00a0the application made quite heavy use of table locks. We found a hint that something suspicious is going on in the PostgreSQL log:<\/p>\n

postgres[13251]: [40-1] user=,db= ERROR:  canceling autovacuum task<\/pre>\n

Searching the application source brought up several places where table locks were used. Example:<\/p>\n

stmt = \"\"\"\r\nLOCK %(table)s IN EXCLUSIVE MODE;\r\nDELETE FROM %(table)s WHERE docid = %%s;\r\nINSERT INTO %(table)s (docid, coefficient, marker, text_vector)\r\nVALUES (%%s, %%s, %%s, %(clause)s)\r\n\"\"\" % {'table': self.table, 'clause': clause}<\/pre>\n

The textindex code was particularly problematic as it dealt often with large documents. Statements like the one above could easily place load on the database server high enough to cause frequent autovacuum aborts.<\/p>\n

The developers said that they have introduced the locks because of concurrency issues. As we could not get rid of them, I have installed a nightly cron job to force-vacuum the database. PostgreSQL has shown much improved query responses since then. Some queries’ completion times even improved by a factor of 10. I’ve been told that in the meantime they have found a way to remove the locks so the cron job is not necessary anymore.<\/p>\n<\/div>\n

\n

Summary<\/h2>\n

PostgreSQL shows good auto-tuning and is a pretty low-maintenance database server if<\/em> you allow it to perform its autovacuum\/autoanalyze tasks regularly. We have seen that application programs may put autovacuum effectively out of business. In this particular case, unfinished transactions and extensive use of table locks were the show-stoppers. After we have identified and removed these causes, our PostgreSQL database is running smoothly again.<\/p>\n

We are currently in the process of integrating some of the most obvious signs of trouble into the standard database monitoring on our managed hosting platform<\/a> to catch those problems quickly as they show up.<\/p>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"

The problem Some weeks ago, we received a complaint from a customer about bad PostgreSQL performance for a specific application. I took a look into the database and found strange things going on: the query planner was executing “interesting” query plans, tables were bloated with lots of dead rows (one was 6 times as big … Continue reading “Don’t stop PostgreSQL’s autovacuum with your application”<\/span><\/a><\/p>\n","protected":false},"author":11966441,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_coblocks_attr":"","_coblocks_dimensions":"","_coblocks_responsive_height":"","_coblocks_accordion_ie_support":"","advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_newsletter_tier_id":0,"footnotes":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[10221],"tags":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_likes_enabled":true,"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pFP3y-2L","jetpack-related-posts":[{"id":179,"url":"https:\/\/blog.gocept.com\/2012\/05\/28\/sprint-report-deploying-python-web-applications-platforms-and-applications\/","url_meta":{"origin":171,"position":0},"title":"Sprint report: Deploying Python web applications – platforms and applications","author":"Daniel Havlik","date":"May 28, 2012","format":false,"excerpt":"Last week I met Stephan Diehl, Michael Hierweck, Veit Schiele, and Jens Vagelpohl\u00a0in Berlin for a sprint. Our chosen topic was \"Python web application\u00a0deployment\". In this post I'd like to recap our discussions, gocept's perspective on those, and the deployment tool \"batou\" that we have been incubating in the last\u2026","rel":"","context":"In "en"","block_context":{"text":"en","link":"https:\/\/blog.gocept.com\/category\/en\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1357,"url":"https:\/\/blog.gocept.com\/2013\/08\/20\/viewing-scales-metrics-from-pyramid\/","url_meta":{"origin":171,"position":1},"title":"Viewing scales metrics from Pyramid","author":"","date":"August 20, 2013","format":false,"excerpt":"We've recently started experimenting with the excellent scales library to collect in-process metrics (see Coda Hale's CodeConf talk \"Metrics everywhere\" among many others for reasons why one definitely wants to do that). Scales comes with a flask-based HTTP server that allows viewing the collected measurements and dumping them as JSON.\u2026","rel":"","context":"In "en"","block_context":{"text":"en","link":"https:\/\/blog.gocept.com\/category\/en\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1437,"url":"https:\/\/blog.gocept.com\/2014\/07\/03\/follow-up-actions-after-the-filesystem-corruption-incident\/","url_meta":{"origin":171,"position":2},"title":"Follow up actions after the filesystem corruption incident","author":"","date":"July 3, 2014","format":false,"excerpt":"On 2014-06-07, the Flying Circus experienced a quite unfortunate filesystem corruption\u00a0incident. Most of the VMs have been cleaned up since then, but a few defective files are still around. In the following article, I'll provide some background information on what types of corruption we saw, what you (as our customer)\u2026","rel":"","context":"In "en"","block_context":{"text":"en","link":"https:\/\/blog.gocept.com\/category\/en\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1332,"url":"https:\/\/blog.gocept.com\/2013\/07\/15\/reliable-file-updates-with-python\/","url_meta":{"origin":171,"position":3},"title":"Reliable file updates with Python","author":"","date":"July 15, 2013","format":false,"excerpt":"Programs need to update files. Although most programmers know that unexpected things can happen while performing I\/O, I often see code that has been written in a surprisingly na\u00efve way. In this article, I would like to share some insights on how to improve I\/O reliability in Python code. Consider\u2026","rel":"","context":"In "en"","block_context":{"text":"en","link":"https:\/\/blog.gocept.com\/category\/en\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1454,"url":"https:\/\/blog.gocept.com\/2014\/08\/05\/haproxy-load-balancing-for-php-applications-with-sticky-sessions\/","url_meta":{"origin":171,"position":4},"title":"haproxy load-balancing for PHP applications with sticky sessions","author":"Daniel Havlik","date":"August 5, 2014","format":false,"excerpt":"We like\u00a0applications\u00a0that are\u00a0written\u00a0with a\u00a0shared-nothing\u00a0approach: it\u00a0greatly simplifies running multiple instances on multiple hosts and allows for simple, robust load-balancer configuration.Recently, we had to deploy a PHP application that - in the last minute - turned out to use PHP sessions and\u00a0thus required sticky sessions.We haven't used sticky sessions in a while\u00a0and\u2026","rel":"","context":"With 2 comments","block_context":{"text":"With 2 comments","link":"https:\/\/blog.gocept.com\/2014\/08\/05\/haproxy-load-balancing-for-php-applications-with-sticky-sessions\/#comments"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":39,"url":"https:\/\/blog.gocept.com\/2011\/01\/24\/testing-pyramid-applications-with-zope-testbrowser\/","url_meta":{"origin":171,"position":5},"title":"Testing pyramid applications with zope.testbrowser","author":"","date":"January 24, 2011","format":false,"excerpt":"The Pyramid documentation recommends using WebTest for functional tests, but coming from the Zope world, zope.testbrowser is the more familiar tool for this job (and it seems a little more high-level than WebTest at first glance). With the 3.11 release,\u00a0 zope.testbrowser gained out-of-the-box support for talking to WSGI applications (courtesy\u2026","rel":"","context":"In "en"","block_context":{"text":"en","link":"https:\/\/blog.gocept.com\/category\/en\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/posts\/171"}],"collection":[{"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/users\/11966441"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/comments?post=171"}],"version-history":[{"count":7,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/posts\/171\/revisions"}],"predecessor-version":[{"id":323,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/posts\/171\/revisions\/323"}],"wp:attachment":[{"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/media?parent=171"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/categories?post=171"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.gocept.com\/wp-json\/wp\/v2\/tags?post=171"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}