Petabyte Scale Data Warehousing with Open Source Greenplum Database
Marshall Presser is a Data Engineer in Pivotal's Data Labs where he helps customers solve complex analytic problems with the Greenplum Database.
Prior to coming to Pivotal (formerly Greenplum), he spent 12 years at Oracle, specializing in High Availability, Business Continuity, Clustering, Parallel Database Technology, Disaster Recovery and Large Scale Database Systems. Marshall has also worked for a number of hardware vendors implementing clusters and other parallel architectures. His background includes parallel computation, operating system and compiler development as well as private consulting for organizations in heath care, financial services, and federal and state governments.
Marshall holds a B.A in Mathematics and an M.A. in Economics and Statistics from the University of Pennsylvania and a M.Sc. in Computing from Imperial College, London.
No video of the event yet, sorry!
It's more than just storing and retrieving data. Equally important are loading high volume data in parallel and running analytics in the database. This hands-on session will lead you through the entire process of creating, loading, and analyzing data in the Greenplum MPP database. It's PostgreSQL, but bigger and DWH-focused.
At the end of this workshop, attendees will learn modern DWH techniques in a PostgreSQL based Massively Parallel Processing platform. This includes the basic architecture of the Greenplum Database, the parallel techniques for loading, querying, and analyzing structured and semi-structured data, as well as the tools Greenplum provides for doing analytics in the database.
Workshop Agenda: 1. Introduction to MPP and Greenplum 2. Distribution -- a key to good performance in Greenplum 3. Parallel loading -- loading multi Terabytes per hour 4. Loading from s3 and external connectivity 5. Polymorphic storage and external partitions 6. Compare external tables to Foreign Data Wrappers 7. Partitioning vs. Distribution -- how they interact 8. Difference between PG and GP partitions 9. Query response time exercises 10. Running Analytics in Greenplum: MADlib exercise 11. Analyzing Free Form Text with SOLR and GPText 12. Monitoring and Managing Greenplum with Command Center 13. Managing Concurrency with Resource groups and Workload Manager 14. Running PL/Python and PL/R as Trusted Languages with PL/Container
Pre-requisites: Laptop with a modern browser and SSH client; Instruction on using SSH on Windows; Basic knowledge of SQL
Users will connect to a cloud based Greenplum Cluster.
There will be a maximum of 25 attendees.
Suggested Pre work:
Videos on YouTube Channel
- 2019 March 18 09:00
- 3 h
- Union Square
- Postgres Conference
- Greenplum Summit