Distributing queries the Citus way
Marco is a principal engineer at Citus Data where he works on Citus, an extension for scaling out PostgreSQL tables across many servers. Previously he worked as a software engineer on Amazon CloudFront and Route 53 for several years, and has an MSc in Parallel and Distributed Computer Systems from VU University Amsterdam and a PhD on cooperative self-driving cars from Trinity College Dublin.
Citus is a sharding extension for postgres that can efficiently distribute a wide range of SQL queries. It uses postgres' planner hook to transparently intercept and plan queries on "distributed" tables. Citus then executes the queries in parallel across many servers, in a way that delegates most of the heavy lifting back to postgres.
Within Citus, we distinguish between several types of SQL queries, which each have their own planning logic:
- Local-only queries
- Single-node “router” queries
- Multi-node “real-time” queries
- Multi-stage queries
Each type of query corresponds to a different use case, and Citus implements several planners and executors using different techniques to accommodate the performance requirements and trade-offs for each use case.
This talk will discuss the internals of the different types of planners and executors for distributing SQL on top of postgres, and how they can be applied to different use cases.
- 2018 April 20 09:50
- 50 min
- PostgresConf US 2018
- Postgres Internals