Postgres Conference

Presented by:

Marshall Presser

Pivotal Software

Marshall Presser is a Data Engineer in Pivotal's Data Labs where he helps customers solve complex analytic problems with the Greenplum Database.

Prior to coming to Pivotal (formerly Greenplum), he spent 12 years at Oracle, specializing in High Availability, Business Continuity, Clustering, Parallel Database Technology, Disaster Recovery and Large Scale Database Systems. Marshall has also worked for a number of hardware vendors implementing clusters and other parallel architectures. His background includes parallel computation, operating system and compiler development as well as private consulting for organizations in heath care, financial services, and federal and state governments.

Marshall holds a B.A in Mathematics and an M.A. in Economics and Statistics from the University of Pennsylvania and a M.Sc. in Computing from Imperial College, London.

Andreas Scherbaum

Pivotal

Andreas Scherbaum is working with PostgreSQL since 1997. He is involved in several PostgreSQL related community projects, member of the Board of Directors of the European PostgreSQL User Group and also wrote a PostgreSQL book (in German). Since 2011 he is working for EMC/Greenplum/Pivotal and tackles very big databases.

Craig Sylvester

VMware

Craig Sylvester is an Advisory Data Engineer for VMware supporting Federal accounts. He brings over 25 years of database experience supporting customers in the Federal and commercial markets. His extensive database experience includes multi-year stints as a consultant and sales engineer at Informix, Netezza, and MySQL. Prior to getting into sales, Craig spent 6 years as a software developer supporting Federal and commercial accounts.

No video of the event yet, sorry!

It's more than just storing and retrieving data. Equally important are loading high volume data in parallel and running analytics in the database. This hands-on session will lead you through the entire process of creating, loading, and analyzing data in the Greenplum MPP database. It's PostgreSQL, but bigger and DWH-focused.

At the end of this workshop, attendees will learn modern DWH techniques in a PostgreSQL based Massively Parallel Processing platform. This includes the basic architecture of the Greenplum Database, the parallel techniques for loading, querying, and analyzing structured and semi-structured data, as well as the tools Greenplum provides for doing analytics in the database.

Workshop Agenda

Introduction to MPP and Greenplum
Distribution -- a key to good performance in Greenplum
Parallel loading -- loading multi Terabytes per hour
Loading from s3 and external connectivity
Polymorphic storage and external partitions
Compare external tables to Foreign Data Wrappers
Partitioning vs. Distribution -- how they interact
Difference between PG and GP partitions
Query response time exercises
Running Analytics in Greenplum: MADlib exercise
Analyzing Free Form Text with SOLR and GPText
Monitoring and Managing Greenplum with Command Center
Managing Concurrency with Resource groups and Workload Manager
Running PL/Python and PL/R as Trusted Languages with PL/Container

Pre-requisites: Laptop with a modern browser and SSH client; Instruction on using SSH on Windows; Basic knowledge of SQL

Users will connect to a cloud based Greenplum Cluster.

There will be a maximum of 25 attendees.

Suggested Pre work

Videos on YouTube Channel:

GP Database basics https://www.youtube.com/watch?v=cCuGX_fLNl8&list=PL4duir3J-8GUodk1uS9ONPU_eWvfCeVjT

GP & analytics https://www.youtube.com/watch?v=3K1PRZNYHZE&list=PL4duir3J-8GXgVNvHVE8Y86W79Gzu5oEk

GP & MADlib https://www.youtube.com/watch?v=Nza2F2dU-Q0&list=PL4duir3J-8GUcubGGpudx6KCCxp8onTI8

Date:: 2019 March 18 09:00 EDT
Duration:: 7 h
Room:: Sugar Hill
Conference:: Postgres Conference
Language:
Track:: Greenplum Summit
Difficulty:: Easy

Petabyte Scale Data Warehousing with Open Source Greenplum Database

Presented by:

Marshall Presser

Andreas Scherbaum

Craig Sylvester

No video of the event yet, sorry!

Why Guidelines

Types of Presentations We Accept

Laptop connections supported

What about X technology?

How do presentations on forks or closed source versions help the success of Postgres?

Shouldn’t the community be promoting Open Source solutions over proprietary closed source solutions?

Types of Presentations

Introduction

Inclusivity and Appropriate Conduct

Retaliation

Reporting

Acting in Good Faith

Conclusion