Building a code-security service using time-series data in Postgres
Preetam is a Principal Engineer at ShiftLeft where he focuses on backend APIs and data engineering.
No video of the event yet, sorry!
ShiftLeft is a security startup in the Bay Area that's working on a combined static analysis and runtime protection security solution. The code analysis process starts with either JVM byte code or source code and is transformed into a novel graph representation called a Code Property Graph (CPG). This CPG, which is a combination of syntax, control flow, and data flow information, is then inspected using graph traversal queries to find data leaks, injections, and other security vulnerabilities.
As a Software-as-a-Service (SaaS) security platform, we have many of the same database needs that you'd expect for modern web applications. These include storing user and organization accounts, project metadata, billing information, and more. What's unique to our platform is how we leverage PostgreSQL to operate our code analysis platform and UI. Rather than using a dedicated queueing system to coordinate tasks between pipeline stages, we instead use a queue table, and also track the status of various pipeline stages using a separate table. Note that this wasn't always true; we used to use Kafka as a job queue but we've had a much better reliability and observability experience with PostgreSQL for reasons I'll explain in the talk.
While we can get exceptional detection results using static analysis alone, we also have to rely on time series generated at runtime to determine which vulnerabilities are actually being exercised to prioritize them for resolution. For this we deploy microagents along with applications which use a security profile derived from the CPG. These microagents instrument the vulnerable portions of the application and then transmit security events and other runtime data back to our SaaS API. The runtime data is then stored in PostgreSQL using TimescaleDB, which is an open-source extension to PostgreSQL. I'll explain why we use TimescaleDB and PostgreSQL as opposed to other time-series databases, how it's used to power our time-series infrastructure, and the tooling we've developed to manage it in production.
- 2019 March 22 10:00 EDT
- 50 min
- Riverside Suite
- Postgres Conference
- Use Cases