Presented by:
Sai Srirampur
I am the Co-founder and CEO of PeerDB. Prior to PeerDB, I worked at Microsoft leading solutions engineering for all Postgres services on Azure. Before that I worked at Citus Data, as an early employee and saw it through the Microsoft acquisition. For the past 8 years, I have been an active member in Postgres community, helping customers implement Postgres and Citus.
No video of the event yet, sorry!
Every datastore is unique with a diverse set of features and data modeling characteristics. For example, PostgreSQL has 4 ways to ingest data, 5 ways to read data, 300+ data types and 300+ database configs. Building data movement solutions that scale, therefore, requires an emphasis on the unique design and capabilities of each data store.
However, most existing data movement tools focus on breadth over quality of connectors. They often fail at scale due to painfully slow syncs, lack of reliability, and lack of features. These challenges are reflected in the number of companies building in-house solutions and maintaining large data engineering teams.
This emphasizes the need of first class data movement tool for Postgres. A tool that focuses on quality over breadth and is native to Postgres. In this talk, I will do a deep dive into what it takes to build a Postgres-specialized data movement tool.
I will cover the architectural tradeoffs - Why choose a peer-to-peer architecture that keeps data-stores at the center vs a hub-and-spoke one that optimizes for the breadth of connectors?
Deep-dive into Postgres native optimizations to enhance performance, reliability and richness of data-movement:
- Partitioning a Postgres table using internal tuple identifiers (CTIDs) and implement parallel snapshotting to move TBs of data in hours vs days;
- Preserve data type nativity while moving specialized types such as Geospatial, JSONB, ARRAYs to Postgres and non Postgres targets;
- Reliably manage Schema Changes on the target by using Relation messages from logical decoding.
- Efficiently replicate TOAST columns without requiring REPLICA IDENTITY FULL.
To sum it up, I will share what needs to go into Postgres upstream to make data movement a first-class citizen.
- Date:
- 2024 April 19 16:30 PDT
- Duration:
- 20 min
- Room:
- San Pedro
- Conference:
- Postgres Conference 2024
- Language:
- English
- Track:
- Dev
- Difficulty:
- Intermediate