Presented by:

138519071305ab186cffb6248f2485f9

Linas Valiukas

pypt.lt

Linas has a bunch of varied, weird experience, including but not limited to building JavaScript-based public transport planners back in the day before "frontend" was even a thing, writing iOS keyboard apps optimized for typing when one's fingers are slippery from holding a pizza, and creating experimental spellcheckers for exotic hipster languages.

Five years ago, Linas has found himself in front of a massive PostgreSQL database, and has grown to love PostgreSQL's maturity and stability ever since.

No video of the event yet, sorry!
Download the Slides

One of the databases that I'm working on belongs to an academic project, and academia is notorious for their dislike of deleting data - in their eyes, every single byte has "future research potential" and so nothing is to be purged at any cost. Thus, research datasets have a tendency to grow to colossal sizes, and normal database management practices no longer apply - one has to put their own DBA strategy using scraps of information on mailing lists, RhodiumToad's IRC logs, and creative hacks and tricks of varying nastiness that one has thought of in a shower.

In this talk, I present my own stash of tricks of dealing with large (1+ TB, 1+ billion of rows) tables:

  1. Real and imaginary reasons to partition large tables
  2. Gradually partitioning large tables without any downtime
  3. Partitions and query planner
  4. Continuously backing up large tables using volume snapshots
  5. Adding new columns with DEFAULT values to large tables
  6. Large indexes, index bloat, and dirty tricks on how to make indexes smaller
  7. Dumping large tables
  8. Replicating huge datasets

Date:
2018 October 15 14:00
Duration:
20 min
Room:
Winchester 2
Conference:
Silicon Valley
Language:
Track:
Data
Difficulty:
Medium