Presented by:

Display pic

Sai Srirampur

ClickHouse Inc

Sai leads all the Postgres and Database integration efforts at ClickHouse. He was the CEO and Co-founder of PeerDB, which he sold to ClickHouse. Sai is a Postgres enthusiast who's helped hundreds of companies get the most out of their databases. He built Postgres tools at Microsoft and was an early Citus Data employee (acquired by Microsoft).

No video of the event yet, sorry!

Every datastore is unique with a diverse set of features and data modeling characteristics. For example, PostgreSQL has 4 ways to ingest data, 5 ways to read data, 300+ data types and 300+ database configs. Building data movement solutions that scale, therefore, requires an emphasis on the unique design and capabilities of each data store.

However, most existing data movement tools focus on breadth over quality of connectors. They often fail at scale due to painfully slow syncs, lack of reliability, and lack of features. These challenges are reflected in the number of companies building in-house solutions and maintaining large data engineering teams.

This emphasizes the need of first class data movement tool for Postgres. A tool that focuses on quality over breadth and is native to Postgres. In this talk, I will do a deep dive into what it takes to build a Postgres-specialized data movement tool.

  • I will cover the architectural tradeoffs - Why choose a peer-to-peer architecture that keeps data-stores at the center vs a hub-and-spoke one that optimizes for the breadth of connectors?

  • Deep-dive into Postgres native optimizations to enhance performance, reliability and richness of data-movement:

    • Partitioning a Postgres table using internal tuple identifiers (CTIDs) and implement parallel snapshotting to move TBs of data in hours vs days;
    • Preserve data type nativity while moving specialized types such as Geospatial, JSONB, ARRAYs to Postgres and non Postgres targets;
    • Reliably manage Schema Changes on the target by using Relation messages from logical decoding.
    • Efficiently replicate TOAST columns without requiring REPLICA IDENTITY FULL.
  • To sum it up, I will share what needs to go into Postgres upstream to make data movement a first-class citizen.

Date:
2024 April 19 16:30 PDT
Duration:
20 min
Room:
San Pedro
Conference:
Postgres Conference 2024
Language:
English
Track:
Dev
Difficulty:
Intermediate