Data Pipelines for Data as a Service (Wide Data)
Presented by:
Mara Lemagie
Mara is fueled by helping people discover the value of well-organized and efficient queries. As the lead data engineer at Bazean, she helps scale data systems for energy investing. She has worked in a variety of data management roles for over 10 years, including helping start ups establish their inventory management systems and working with non-profit groups to measure and promote their successes.
VERY Rough Draft
For work flows that require pulling in data from diverse sources (size, shape, frequency, context) of data and merging together for analysis, this talk covers the best practices we have developed for loading, tracking, validating, and merging these disparate datasets to ensure optimal flexibility. In the use cases explored all data come from external datasets with constant revisions (worse case scenario), so needing to preserve and manage historical snapshots becomes a challenge. We have developed nomenclature, and some quick techniques for tracking these revisions on a very granular level.
- Date:
- Duration:
- 50 min
- Room:
- Conference:
- Postgres Conference 2020
- Language:
- Track:
- Data Science
- Difficulty:
- Medium