Posts tagged “Greenplum”

As part of the countdown to PostgresConf US 2018, learn more about our Diamond and Platinum sponsors for this year in our Sponsor Spotlight Series.

Heikki Linnakangas is the Senior Principal Software Engineer at Pivotal, one of our Diamond Sponsors for PostgresConf US 2018 and host of the Greenplum Summit. Read what Heikki has to say about Pivotal, Greenplum and Postgres:

As a PostgreSQL committer, how does that influence your work with Greenplum?

We have worked hard on merging more recent PostgreSQL versions into Greenplum in the last couple of years, and reached PostgreSQL 8.4 recently. PostgreSQL 8.4 was first released back in 2008, which is the same year I became a committer in the PostgreSQL project. It was a real blast from the past, to see those first commits of my own flow into the Greenplum repository!

It's a healthy reminder that whatever shortcuts you might be tempted to take, they will come back to haunt you! Fortunately, my fellow PostgreSQL committers are hawk-eyed, and the PostgreSQL commit history is very clean and pleasant to work with.

Do you foresee more collaboration between PostgreSQL and Greenplum in the future?

Yes! As we continue to catch up Greenplum with more recent PostgreSQL versions, the friction of collaboration gets smaller and smaller. In the last couple of years, PostgreSQL has gotten a lot of the basic infrastructure that Greenplum relies on for data distribution, like partitioning and parallelism. That reduces the manpower needed in Greenplum to maintain those features as addons, and frees up developers to work on other things.

As we plan for new Greenplum features, we always try to design them in a way that works well with PostgreSQL, and if applicable, develop them in the PostgreSQL community first. That benefits the PostgreSQL community, by having the features, and it benefits Greenplum, by getting more eyes on the code earlier, which improves code quality.

Are there things that you feel that PostgreSQL can learn from Greenplum? What about Greenplum from PostgreSQL?

PostgreSQL can learn a lot from the features that are in Greenplum, but not yet in PostgreSQL. Usually, the code is not directly applicable, and Greenplum might have made different tradeoffs than the PostgreSQL community wants. But it is nevertheless very useful to look at existing implementations for inspiration, and to learn from the mistakes.

Pivotal has a well-established process for making minor Greenplum releases, emergency bug fixes and such. But between Greenplum 4, and Greenplum 5, the first open source version of Greenplum, there was a long gap. With Greenplum 5, we had to re-learn how to make a major release. PostgreSQL, on the other hand, has maintained a very stable and predictable release process for over 15 years, with roughly annual major version releases, and a 5 year support period for each major version. We are trying to get to a similar stable, predictable, schedule with Greenplum as well.

What challenges have you faced as you continue to push Greenplum toward code parity with PostgreSQL?

At first, we spent a lot of time on just cleaning up the Greenplum codebase. Throughout the PostgreSQL 8.3 merge, which was the first major version upgrade we went through, we ironed out tons of trivial differences between the PostgreSQL and Greenplum code that had crept up over the years. Small changes in whitespace, comments, variable names, and such. Most were well-intended, and made sense on their own, but they hindered the merge.

We're mostly done with that kind of cleanup, and we now have an established process for merging a major PostgreSQL version. But each version has its own challenges. With the PostgreSQL 8.4 merge, for example, PostgreSQL got window functions, and we had to reconcile the existing Greenplum implementation, with the implementation we were getting from PostgreSQL.

With the on-going PostgreSQL 9.1 merge, we will get Foreign Data Wrappers into Greenplum. We will have to decide what it means to have a foreign table in an MPP context. Do you run the foreign table only in the master node? That's straightforward, but you will get no MPP benefits. Or do you have each data segment fetch their own slice of the foreign data? That requires extending Foreign Data Wrapper API, and we need to do that in a way that's compatible with the whole ecosystem of existing PostgreSQL data wrappers.

Mason Sharp, from Maputo Data, is actually giving a presentation on how Postgres-XL and Postgres-XC are distributing Foreign Data Wrappers. I'll be there! This is a great opportunity to work together on a common API, so that the same FDW extension will work consistently with PostgreSQL, as well as all the forks like Postgres-XL and Greenplum.

Are there any specific goals you would like to highlight for collaboration with both communities over the next year?

Developers from EnterpriseDB announced plans to work on a new heap format called "zheap", for PostgreSQL v12. It would address many of the problems with "vacuuming" large tables. Vacuuming is cumbersome, when you scale up to hundreds of terabytes of data or more. Greenplum has largely solved that problem with a custom storage format called Append-Optimized Tables. But we would prefer to not maintain a custom storage format, we'd rather focus on making Greenplum better on MPP things, like parallelizing queries across a cluster. So we will be looking closely at the development of zheap, and want to help.

What sessions are you most excited about attending at PostgresConf US 2018?

I'm looking forward to hear stories from Greenplum customers, how they use the product, what problems they have. I don't speak enough to users! It's easy to lose sight of what day-to-day problems DBAs and application developers face.

I'm also excited about the career fair on Friday. I'm hoping to meet many new colleagues and future PostgreSQL developers there!

What is your favorite aspect of PostgresConf US?

It's my first time, so we'll see! :-) I go to many PostgreSQL developer-oriented conferences, to meet developer colleagues, and talk about upcoming features and engineering issues. In this conference, I'm hoping to hear more from DBAs and users.

Any final thoughts?

If you want to hear more war stories on Greenplum or PostgreSQL development, or have a weird PostgreSQL issue you want to show, or just want to say "hi!", come speak to me! You'll find me loitering around the halls.

 About Pivotal:

Pivotal drives software innovation for many of the world’s most admired brands. With millions of users in communities around the world, Pivotal technology touches billions of users every day.

Pivotal is the maker of Pivotal Greenplum, the world’s first fully-featured, multi-cloud, massively parallel processing (MPP) data analytics platform based on the open source Greenplum Database and Postgres. Pivotal Greenplum provides comprehensive and integrated analytics on multi-structured data. Powered by the world’s most advanced cost-based query optimizer, Pivotal Greenplum delivers unmatched analytical query performance on massive volumes of data.

PostgreSQL is the best open source operational (OLTP) database on the planet, but many PostgreSQL users are forced to work with proprietary analytical databases (e.g. Oracle or Teradata) for their data warehousing and big data workloads. Greenplum Database offers a proven path of migration from expensive and proprietary alternatives to the Postgres ecosystem.

Pivotal at PostgresConf US:

Heikki will be presenting "Greenplum Overview for Postgres Hackers" on Wednesday, April 18, at 10:30 am. Check out all the Greenplum Summit and related content. Stop by and visit the Pivotal team in the Exhibit Hall on Wednesday, April 18, and Thursday, April 19, in the Newport Ballroom, as well as at the Talent Exchange & Career Fair on Friday, April 20, from 10:30 am - 1:30 pm in the Newport Foyer on the 3rd floor.

Check out the full schedule for PostgresConf US 2018, and buy your tickets soon!


As part of the countdown to PostgresConf US 2018, learn more about the engaging content and our speakers for this year in our Speaker Spotlight Series.

Les McMonagle is the VP of Security Strategy, at BlueTalon Inc. He will be presenting "Achieving Data Privacy Compliance in Postgres or Greenplum" on Friday, April 20, at 10:50 am. Attend his session to learn  the difference between Data Protection versus Data Access Control – and why you should care, and read what he has to say about PostgreSQL and Greenplum:


Why PostgreSQL?

BlueTalon's Attribute Based Access Control (ABAC) technology has been developed to fully support PostgreSQL and Greenplum because these are broadly implemented data analytics platforms used for storing and processing of sensitive or regulated data across industries.    

Tell us about your involvement with the greater Postgres community.

PostgreSQL was one of the first platforms BlueTalon developed an Enforcement Point (EP) for to provide centrally managed, consistently applied fine-grained data access controls, audit trail and accountability for all access to sensitive or regulated data stored in PostgreSQL database platforms. 

What new features of PostgreSQL 10 are you most excited about?

BlueTalon ensures full compatibility with each new release and corresponding new features or functionality for PostgreSQL as part of our standard certification process. 

Why should attendees come to your talk at PostgresConf US 2018? What would you like for them to take away from your session?

To learn about next generation data access controls for relational databases and  other data repository platforms and how this ABAC technology integrates seamlessly with PostgresSQL and other database technologies.

What sessions are you most excited about attending at PostgresConf US 2018?

Any data security related sessions.

What is your favorite aspect of PostgresConf US?

Firsthand contact and interaction with technology thought leaders at global corporations. 

What advice would you have for a Computer Science graduate or entry level developer who are interested in learning and engaging with Postgres?

Consider all aspects of designing and implementing any data analytics platform including data protection and access control.  "Privacy by Design" should be a core component of any requirements gathering and system design process.  Data security is an order of magnitude easier and less expensive to build in than it is to try and bolt on later.   


Check out the full schedule for PostgresConf US 2018, and buy your tickets soon!


As part of the countdown to PostgresConf US 2018, learn more about the engaging content and our speakers for this year in our Speaker Spotlight Series.

Hubert Zhang will co-present with Jack Wu at PostgresConf US 2018 on "Customize and Secure the Runtime and Dependencies of Your Procedural Languages using PL/Container." Hubert is a staff software engineer at Pivotal. He received his Master Degree at Peking University, with a major in artificial intelligence. He is most interested in database systems and distributed computing platform.

Tell us about your involvement with the greater Postgres community. (How long have you been involved? How have you contributed? How else would you want to contribute?)

I've been working on Postgres based MPP database Greenplum and HAWQ since 2014. I contributed on PLContainer in Greenplum, data locality, and Ranger module of HAWQ (a SQL-On-Hadoop system).

What features should be developed/improved and released in the next major upgrade?

Vectorize execution for OLAP queries.

Who should attend your talk at PostgresConf US 2018? What would you like for them to take away from your session?

Data scientists and anyone who want to use Python and R in database to do data analysis and machine learning. You'll learn how to use PLContainer as well as how to build a customized docker image to setup your specialized Python or R environment.

Check out the full schedule for PostgresConf US 2018, and buy your tickets soon!


As part of the countdown to PostgresConf US 2018, learn more about the engaging content and our Diamond and Platinum sponsors for this year in our Sponsor Spotlight Series.

Jacque Istok, is the Head of Data for Pivotal, one of our Diamond Sponsors for PostgresConf US 2018. Pivotal is hosting the first annual Greenplum Summit at PostgresConf US 2018, with lots of great Greenplum and Postgres-related content. Read what Jacque has to say bout Greenplum and Postgres, as well as why to attend the Greenplum Summit: 

Greenplum is an Open Source variant of Postgres; what benefits do you bring to the table over vanilla Postgres?

Postgres is a powerful ORDBMS, but as your data scales, the only way to keep up is to buy bigger and bigger machines to run on. It suffers from the same problems that all SMP databases do: you can only get as big as the machine you’re running on.

With Greenplum you can put a subset of your data on a Postgres database on one reasonably-sized machine, and another subset on a second machine, and so on. All of your users and applications can then query one of these Postgres databases as if all the data was in a single location - making your data scale limitless. Greenplum manages the distribution, data shuffling, and querying of all of your data across a magically sharded implementation of Postgres databases.

Greenplum has its own community; what do you hope to achieve by joining the Postgres community and PostgresConf?

The Postgres community represents some of the most passionate and knowledgeable creators, developers, and users of database technology of our time. We believe that the combination of Postgres and Greenplum becomes the software equivalent of what Oracle Exadata purported to be: an all-purpose database that can do both transactional and analytical workloads across multi-structured data. Simply put, the Greenplum community is looking to join with the Postgres community to further the understanding and adoption of these technologies.

Do you have plans for cross pollination of technologies with the two open source projects?

Greenplum forked from Postgres over 10 years, circa Postgres 8.2. Greenplum 5.0 is based off of Postgres 8.3, with our next major release slated for Postgres 9.4 (current open source Greenplum is compatible with 9.0 as of this writing).

Likewise, we have Postgres committers working at Pivotal looking for opportunities to improve the Postgres code specifically for analytics. We are also ensuring that other projects related to Greenplum, like Apache MADLib, continue to be compatible with Postgres.

What challenges do you see working with the Postgres community as an open source fork?

The Postgres community is a long-running and very passionate group, and we want to be both collaborative and respectful in how we continue to grow our participation. We see the products as having synergies which complement each other very well, with some use cases that best fit Postgres, and others that best fit Greenplum. The use of either benefits the other as they both further adoption.

What would you tell a user who has a choice between Postgres and Greenplum about when they should use which system?

Postgres is a great ORDBMS that will scale to the performance of a single server. For analytical needs, being restricted to a small number of terabytes does not allow for the type of exploration that most organizations need. Because Greenplum is a Postgres compatible database, you can start out using Postgres and either convert to Greenplum underneath or leverage Greenplum alongside your Postgres systems (making data ETL a ton easier). This then makes the choice of which product to use for your particular use case clearer and clearer.

What is the number one barrier you see to contributing to the Postgres community?

The number one barrier we will have to contributing is not seeing the corresponding adoption of our technologies. We feel very strongly that both the transparency and removal of vendor lock-in make our open source commitment the only choice for users. I’m here to implore the community to embrace our technology with zeal and help us continue to drive more and more Postgres adoption in the world.

What is the best thing about working with the Postgres community?

Because Greenplum is based on Postgres, we get to interact with this vast community of talent. We are also able to more seamlessly interact with ecosystem products that already work with Postgres, making the adoption of Greenplum that much easier.

Tell us why you believe people should attend PostgresConf 2018 in April.

PostgresConf is going to be awesome - with both Pivotal and Amazon headlining as Diamond sponsors - as well as the quality of speakers and their content. I wouldn’t miss it for anything.

We’re thrilled to organize the first annual Greenplum Summit at PostgresConf. Greenplum co-founder, Scott Yara, will give a keynote on April 18th relating to how data tells the story at the organizations that we help enable (#DataTellsTheStory), and his journey from SMP to MPP. Greenplum Summit on April 19th will be a full day packed with with great use case sessions and tech talks for novices and experts alike.

Check out the full schedule for PostgresConf US 2018, and buy your tickets soon!

Joshua D. Drake     March 26, 2018     postgres Greenplum postgresql pivotal

Latest Posts