Data processing more than billion rows per second
KaiGai Kohei has been a contributor of PostgreSQL and Linux kernel over ten years, especially, at security, database-federation, extensible executor and so on. Also, he has been the primary developer of PG-Strom since 2012, It is an extension of PostgreSQL to accelerate analytic and reporting queries using GPU and NVME.
He founded HeteroDB,Inc at 2017 for productization of the PG-Strom technology and democratization of data analytics.
Nowadays, GPU is not only for computing intensive workloads, but for I/O intensive big-data workloads also.
This talk introduces how SSD-to-GPU Direct SQL, implemented as extension of PostgreSQL, optimizes data flow from storages to processors over PCIe-bus for efficient execution of analytic/reporting workloads.
Combination of this technology with comprehensive database features (e.g, columnar-store, partitioned tables, ...) pulled out maximum capability of the latest hardwares, for more than billion rows per second grade data processing on a single-node PostgreSQL server.
Its main focus is log-data processing on IoT/M2M area where tons of data is generated day-by-day. Our approach allows to simplify the system landscape, and utilize engineer's knowledge and experiences of PostgreSQL.
In short, this talk contains the items below from the technology viewpoint.
- SSD-to-GPU Direct SQL
- Columnar-store (Arrow_Fdw)
- PCIe-bus level optimization using table partitioning
- Benchmark results
- Customer case (under the negotiation)
For your references:
- 50 min
- Postgres Conference 2020
- Data Science