YMatrix Domino: Design Considerations, Trade-offs, and Implementation of In-Database Stream Processing
Presented by:
Yandong Yao
Founder & CEO, Beijing Siwei Zongheng Data Technology Co., Ltd. Former General Manager, Greenplum Beijing R&D Center Yan-Dong Yao is a pioneer in open-source data technologies and ecosystem development. He founded the Greenplum China Open Source Community while serving on the standing committee of the PostgreSQL Chinese Community. He is also a co-founder of the 1024 Digital Foundation. Since Greenplum’s open-source release, Yan-Dong has led efforts to grow its ecosystem in China, establishing a significant user base across major cloud providers including Alibaba Cloud, Tencent Cloud, and Baidu Cloud. He holds multiple domestic and international patents in cloud computing and big data, and authored Greenplum: From Big Data Strategy to Implementation. Before his work with Greenplum, he worked in systems and storage at Sun Microsystems and Symantec. Yan-Dong graduated with honors from the Institute of Software, Chinese Academy of Sciences.
Hao Wang
Hao Wang is the Chief Architect for observability and intelligent database systems at YMatrix and Greenplum. He focuses on building next-generation database observability and automation platforms, driving transparency and intelligence in large-scale distributed database operations. He pioneered the world’s first distributed query plan visualization and real-time execution progress tracking system, enabling unprecedented visibility into complex database workloads. In addition to his industry work, Wang serves as an instructor for distributed database systems at Tsinghua University.
No video of the event yet, sorry!
YMatrix is a distributed, multi-model database built on PostgreSQL. Domino is its native in-database stream processing engine, enabling true unified batch and stream processing. This talk explores the design concepts, technology selection, and implementation of in-database stream processing in YMatrix, aiming to address the complexity, resource overhead, and consistency challenges of traditional external streaming architectures. Background and Motivation In conventional data architectures, data is extracted from OLTP systems and processed either through T+1 batch pipelines or external stream processing engines (such as Flink) before being loaded into data marts. This approach suffers from limited real-time capabilities, high system complexity, data redundancy, and potential consistency risks. In-database stream processing offers a more efficient alternative by simplifying system architecture, reducing resource consumption, strengthening data lineage, and improving flexibility. Core Design GoalsThe system is designed to support incremental processing (distinct from materialized views), near real-time latency (sub-second or configurable), multi-stage stream concatenation, minimal impact on OLTP workloads (better than trigger-based solutions), eventual consistency, and high-throughput processing. Meanwhile, it provides full SQL-level semantics, avoiding the need for user-defined code. Technical Implementation - SQL Syntax DesignThe SQL grammar is extended with CREATE STREAM ... AS SELECT ... STREAMING to define stream objects that subscribe to incremental changes from upstream tables. The semantics resemble materialized views but support continuous incremental updates. - Incremental Data CaptureLogical Decoding is used to create replication slots for capturing WAL changes. Combined with initial historical snapshots to initialize stream tables, background workers continuously execute incremental query plans to synchronize streaming tables. - Progress ManagementStream progress (restart_lsn / confirm_lsn) is maintained in shared memory and atomically updated on transaction commit. Integration with XLog and checkpoint mechanisms ensures reliable crash recovery. - Plugin FrameworkThe framework supports three primary SQL patterns: single-stream join with dimension tables (domino_one), single-stream aggregation (domino_agg), and dual-stream Inner Join (domino_join). Each plugin matches SQL query patterns and generates corresponding incremental execution plans. - Update HandlingLogical Decoding outputs DELETE/INSERT markers, and primary-key conflict handling (INSERT ... ON CONFLICT) enables incremental propagation of upstream updates and deletes. For aggregation, inverse transition functions (aggmtransfn / aggminvtransfn) are introduced to support retraction. Dual-stream join scenarios further support arbitrary update/delete operations on both input streams. BenefitsCompared with external streaming engines, this approach significantly simplifies system architecture, reduces resource costs (cutting down serialization and network transmission overhead), strengthens data consistency guarantees, and lowers the user adoption barrier through pure SQL semantics.
- Date:
- 2026 April 21 13:30 PDT
- Duration:
- 20 min
- Room:
- Santa Clara (Level C)
- Conference:
- Postgres Conference: 2026
- Language:
- Track:
- Dev
- Difficulty:
- Medium