Presented by:

Marsall presser

Marshall Presser

Pivotal Software

Marshall Presser is a Data Engineer in Pivotal's Data Labs where he helps customers solve complex analytic problems with the Greenplum Database.

Prior to coming to Pivotal (formerly Greenplum), he spent 12 years at Oracle, specializing in High Availability, Business Continuity, Clustering, Parallel Database Technology, Disaster Recovery and Large Scale Database Systems. Marshall has also worked for a number of hardware vendors implementing clusters and other parallel architectures. His background includes parallel computation, operating system and compiler development as well as private consulting for organizations in heath care, financial services, and federal and state governments.

Marshall holds a B.A in Mathematics and an M.A. in Economics and Statistics from the University of Pennsylvania and a M.Sc. in Computing from Imperial College, London.

Andreas Scherbaum is working with PostgreSQL since 1997. He is involved in several PostgreSQL related community projects, member of the Board of Directors of the European PostgreSQL User Group and also wrote a PostgreSQL book (in German). Since 2011 he is working for EMC/Greenplum/Pivotal and tackles very big databases.

Craig Sylvester is an Advisory Data Engineer for VMware supporting Federal accounts. He brings over 25 years of database experience supporting customers in the Federal and commercial markets. His extensive database experience includes multi-year stints as a consultant and sales engineer at Informix, Netezza, and MySQL. Prior to getting into sales, Craig spent 6 years as a software developer supporting Federal and commercial accounts.

No video of the event yet, sorry!

It's more than just storing and retrieving data. Equally important are loading high volume data in parallel and running analytics in the database. This hands-on session will lead you through the entire process of creating, loading, and analyzing data in the Greenplum MPP database. It's PostgreSQL, but bigger and DWH-focused.

At the end of this workshop, attendees will learn modern DWH techniques in a PostgreSQL based Massively Parallel Processing platform. This includes the basic architecture of the Greenplum Database, the parallel techniques for loading, querying, and analyzing structured and semi-structured data, as well as the tools Greenplum provides for doing analytics in the database.

Workshop Agenda

  1. Introduction to MPP and Greenplum

  2. Distribution -- a key to good performance in Greenplum

  3. Parallel loading -- loading multi Terabytes per hour

  4. Loading from s3 and external connectivity

  5. Polymorphic storage and external partitions

  6. Compare external tables to Foreign Data Wrappers

  7. Partitioning vs. Distribution -- how they interact

  8. Difference between PG and GP partitions

  9. Query response time exercises

  10. Running Analytics in Greenplum: MADlib exercise

  11. Analyzing Free Form Text with SOLR and GPText

  12. Monitoring and Managing Greenplum with Command Center

  13. Managing Concurrency with Resource groups and Workload Manager

  14. Running PL/Python and PL/R as Trusted Languages with PL/Container

Pre-requisites: Laptop with a modern browser and SSH client; Instruction on using SSH on Windows; Basic knowledge of SQL

Users will connect to a cloud based Greenplum Cluster.

There will be a maximum of 25 attendees.

Suggested Pre work

Videos on YouTube Channel:

GP Database basics https://www.youtube.com/watch?v=cCuGX_fLNl8&list=PL4duir3J-8GUodk1uS9ONPU_eWvfCeVjT

GP & analytics https://www.youtube.com/watch?v=3K1PRZNYHZE&list=PL4duir3J-8GXgVNvHVE8Y86W79Gzu5oEk

GP & MADlib https://www.youtube.com/watch?v=Nza2F2dU-Q0&list=PL4duir3J-8GUcubGGpudx6KCCxp8onTI8

Date:
2019 March 18 09:00 EDT
Duration:
7 h
Room:
Sugar Hill
Conference:
Postgres Conference
Language:
Track:
Greenplum Summit
Difficulty:
Easy