Heikki Linnakangas is the Senior Principal Software Engineer at Pivotal, one of our Diamond Sponsors for PostgresConf US 2018 and host of the Greenplum Summit. Read what Heikki has to say about Pivotal, Greenplum and Postgres:
As a PostgreSQL committer, how does that influence your work with Greenplum?
We have worked hard on merging more recent PostgreSQL versions into Greenplum in the last couple of years, and reached PostgreSQL 8.4 recently. PostgreSQL 8.4 was first released back in 2008, which is the same year I became a committer in the PostgreSQL project. It was a real blast from the past, to see those first commits of my own flow into the Greenplum repository!
It's a healthy reminder that whatever shortcuts you might be tempted to take, they will come back to haunt you! Fortunately, my fellow PostgreSQL committers are hawk-eyed, and the PostgreSQL commit history is very clean and pleasant to work with.
Do you foresee more collaboration between PostgreSQL and Greenplum in the future?
Yes! As we continue to catch up Greenplum with more recent PostgreSQL versions, the friction of collaboration gets smaller and smaller. In the last couple of years, PostgreSQL has gotten a lot of the basic infrastructure that Greenplum relies on for data distribution, like partitioning and parallelism. That reduces the manpower needed in Greenplum to maintain those features as addons, and frees up developers to work on other things.
As we plan for new Greenplum features, we always try to design them in a way that works well with PostgreSQL, and if applicable, develop them in the PostgreSQL community first. That benefits the PostgreSQL community, by having the features, and it benefits Greenplum, by getting more eyes on the code earlier, which improves code quality.
Are there things that you feel that PostgreSQL can learn from Greenplum? What about Greenplum from PostgreSQL?
PostgreSQL can learn a lot from the features that are in Greenplum, but not yet in PostgreSQL. Usually, the code is not directly applicable, and Greenplum might have made different tradeoffs than the PostgreSQL community wants. But it is nevertheless very useful to look at existing implementations for inspiration, and to learn from the mistakes.
Pivotal has a well-established process for making minor Greenplum releases, emergency bug fixes and such. But between Greenplum 4, and Greenplum 5, the first open source version of Greenplum, there was a long gap. With Greenplum 5, we had to re-learn how to make a major release. PostgreSQL, on the other hand, has maintained a very stable and predictable release process for over 15 years, with roughly annual major version releases, and a 5 year support period for each major version. We are trying to get to a similar stable, predictable, schedule with Greenplum as well.
What challenges have you faced as you continue to push Greenplum toward code parity with PostgreSQL?
At first, we spent a lot of time on just cleaning up the Greenplum codebase. Throughout the PostgreSQL 8.3 merge, which was the first major version upgrade we went through, we ironed out tons of trivial differences between the PostgreSQL and Greenplum code that had crept up over the years. Small changes in whitespace, comments, variable names, and such. Most were well-intended, and made sense on their own, but they hindered the merge.
We're mostly done with that kind of cleanup, and we now have an established process for merging a major PostgreSQL version. But each version has its own challenges. With the PostgreSQL 8.4 merge, for example, PostgreSQL got window functions, and we had to reconcile the existing Greenplum implementation, with the implementation we were getting from PostgreSQL.
With the on-going PostgreSQL 9.1 merge, we will get Foreign Data Wrappers into Greenplum. We will have to decide what it means to have a foreign table in an MPP context. Do you run the foreign table only in the master node? That's straightforward, but you will get no MPP benefits. Or do you have each data segment fetch their own slice of the foreign data? That requires extending Foreign Data Wrapper API, and we need to do that in a way that's compatible with the whole ecosystem of existing PostgreSQL data wrappers.
Mason Sharp, from Maputo Data, is actually giving a presentation on how Postgres-XL and Postgres-XC are distributing Foreign Data Wrappers. I'll be there! This is a great opportunity to work together on a common API, so that the same FDW extension will work consistently with PostgreSQL, as well as all the forks like Postgres-XL and Greenplum.
Are there any specific goals you would like to highlight for collaboration with both communities over the next year?
Developers from EnterpriseDB announced plans to work on a new heap format called "zheap", for PostgreSQL v12. It would address many of the problems with "vacuuming" large tables. Vacuuming is cumbersome, when you scale up to hundreds of terabytes of data or more. Greenplum has largely solved that problem with a custom storage format called Append-Optimized Tables. But we would prefer to not maintain a custom storage format, we'd rather focus on making Greenplum better on MPP things, like parallelizing queries across a cluster. So we will be looking closely at the development of zheap, and want to help.
What sessions are you most excited about attending at PostgresConf US 2018?
I'm looking forward to hear stories from Greenplum customers, how they use the product, what problems they have. I don't speak enough to users! It's easy to lose sight of what day-to-day problems DBAs and application developers face.
I'm also excited about the career fair on Friday. I'm hoping to meet many new colleagues and future PostgreSQL developers there!
What is your favorite aspect of PostgresConf US?
It's my first time, so we'll see! :-) I go to many PostgreSQL developer-oriented conferences, to meet developer colleagues, and talk about upcoming features and engineering issues. In this conference, I'm hoping to hear more from DBAs and users.
Any final thoughts?
If you want to hear more war stories on Greenplum or PostgreSQL development, or have a weird PostgreSQL issue you want to show, or just want to say "hi!", come speak to me! You'll find me loitering around the halls.
Pivotal drives software innovation for many of the world’s most admired brands. With millions of users in communities around the world, Pivotal technology touches billions of users every day.
Pivotal is the maker of Pivotal Greenplum, the world’s first fully-featured, multi-cloud, massively parallel processing (MPP) data analytics platform based on the open source Greenplum Database and Postgres. Pivotal Greenplum provides comprehensive and integrated analytics on multi-structured data. Powered by the world’s most advanced cost-based query optimizer, Pivotal Greenplum delivers unmatched analytical query performance on massive volumes of data.
PostgreSQL is the best open source operational (OLTP) database on the planet, but many PostgreSQL users are forced to work with proprietary analytical databases (e.g. Oracle or Teradata) for their data warehousing and big data workloads. Greenplum Database offers a proven path of migration from expensive and proprietary alternatives to the Postgres ecosystem.
Pivotal at PostgresConf US:
Heikki will be presenting "Greenplum Overview for Postgres Hackers" on Wednesday, April 18, at 10:30 am. Check out all the Greenplum Summit and related content. Stop by and visit the Pivotal team in the Exhibit Hall on Wednesday, April 18, and Thursday, April 19, in the Newport Ballroom, as well as at the Talent Exchange & Career Fair on Friday, April 20, from 10:30 am - 1:30 pm in the Newport Foyer on the 3rd floor.