AutoVacuum - Tuning and Monitoring
I have been a PostgreSQL administrator since 2002 and have worked for Fred Hutchinson Cancer Research Center since 2008. Within Fred Hutch, I work for the largest group called SCHARP, Statistical Center for HIV/AIDS Research and Prevention.
In 2010 I started SeaPUG, Seattle Postgres Users Group, at the request of Josh Drake. I present at least half the presentations there every year. I started the PostgreSQL track at LinuxFest Northwest in 2014 after my GIS presentation in 2013 was standing room only. In 2017 I got a booth at the SeaGL, Seattle GNU Linux, conference with the idea of having a booth there in 2018 along with also doing a PostgreSQL presentation at the conference. In 2017 I started presenting at various PostgresConf Conferences.
I have also discovered several PostgreSQL bugs which have been fixed, some of them affected every version of PostgreSQL.
- 7553 - Variant of the what alias to use after a rename bug in views - Fixed in 9.3
- 8173 - Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 out of 1000 times. - Fixed in 9.2.5 & 9.3
- 8257 - Multi-Core Restore fails when containing index comments - Fixed in 8.4, 9.0, 9.1, 9.2, 9.3
- 8291 - postgres_fdw does not re-read USER MAPING after change - Updated Docs in 9.3
- 8545 - pg_dump does not backup database level globals - Fixed in 11
- 15182 - Major Bug - Affecting all versions of PostgreSQL. Partially fixed in 11 and fully fixed in 12
- 15316 - Creation of check constraint functions that rely on data loaded alphabetically after your primary data will fail to restore properly.
I host the Seattle Postgres Users Group where I give many of the presentations. I have many of the presentations posted at Lloyd's Presentations
I was recently asked by another PostgreSQL DBA about what could be causing queries to run slow while their application is running but then runs fast again after the application is stopped. While I have given several presentations on this subject, each one has been on a specific cause and not talking about all of them together. I decided I would compile them into a single tuning and monitoring presentation.
These slow queries are reported to me by my developers when they start taking minutes to hours to days, in their test and/or production databases. When they stop the application to load up a fresh copy of the database, the same queries run in milliseconds. I have found that this is caused by the AutoVacuum/AutoAnalyze system or lack there of.
Here is a list of some of the symptoms that you could be seeing.
- Small table that gets thousands of rows added and deleted. This table is then joined to a large table to view a set of results.
- Effect: Your queries get slower and slower until a query that was 10 ms now takes 5 seconds or more, but if you start and stop your application, then the problem self fixes itself.
- Large table with tens of millions of rows where you are deleting and adding millions of rows per hour.
- Effect: You find that your queries are slowing down and when you look at your AutoVacuum it seems to be AutoVacuum this table 24 hours per day, but if you stop your application for a few day or dump and restore your database it fixes itself.
- Large table with tens of millions of rows where you are deleting and then adding millions of rows per hour with the same index values inside of a transaction.
- Effect: Your queries get slower and slower, but if you dump and restore your database, everything is fast again.
- 2019 September 20 11:10
- 50 min
- Silicon Valley 2019
- Ops and Administration