Taming Performance Variability in PostgreSQL
Shawn Sangwook Kim is a co-founder and CEO at Apposha, which is building PostgreSQL extension for scalable file I/O. Before founding Apposha, Sangwook worked at Computer Systems Laboratory in Sunkyunkwan University. His main work was developing operating system-level techniques for database performance. Specifically, Sangwook has over 9 years experience in analyzing and optimizing Linux kernel for open source databases including PostgreSQL, MongoDB, MySQL, and Redis.
He spoke at various PostgreSQL Conferences in the past including -
PGDay Seoul - 2017
PGConf Asia - 2017
PGDay Seoul - 2018
PGConf Asia - 2019
Postgres Conference Silicon Valley - 2019
PGDay Seoul - 2019
PGConf Russia - 2020
PostgreSQL background tasks, such as the checkpointer and autovacuum workers, produce file I/O in bursts. This behavior is increased in write-intensive workloads. These background tasks are essential to guarantee the ACID properties while providing high performance, however, they significantly affect performance variability. In our experiments with write-intensive OLTP queries, tuning the checkpoint following the best practices leads to a 4x improved throughput at the expense of 25x worse variability. This is an undesired behavior since performance predictability is crucial to guarantee SLAs and for larger processing pipelines.
In this talk, I will show how background tasks affect PostgreSQL performance in terms of both average throughput and variability based on experimental results. I will then explain the Linux I/O stack internals and pinpoint the root cause of the performance variability in modern systems. In addition, I am going to share our experience to stabilize performance by scaling up hardware and tune PostgreSQL configurations properly. Finally, I will introduce a new PostgreSQL extension for scalable and predictable file I/O (demo: https://youtu.be/CZCyg0bPGok) that efficiently handles the performance variability of PostgreSQL. This PostgreSQL extension reduces variability 7x while providing 2x higher average throughput compared to the best practice, without any hardware upgrades nor code changes.
- 2019 September 20 16:00
- 50 min
- Silicon Valley 2019
- Ops and Administration