Joshua Drake Blog Posts

As part of the countdown to PostgresConf US 2018, learn more about our Diamond and Platinum sponsors for this year in our Sponsor Spotlight Series.

Heikki Linnakangas is the Senior Principal Software Engineer at Pivotal, one of our Diamond Sponsors for PostgresConf US 2018 and host of the Greenplum Summit. Read what Heikki has to say about Pivotal, Greenplum and Postgres:

As a PostgreSQL committer, how does that influence your work with Greenplum?

We have worked hard on merging more recent PostgreSQL versions into Greenplum in the last couple of years, and reached PostgreSQL 8.4 recently. PostgreSQL 8.4 was first released back in 2008, which is the same year I became a committer in the PostgreSQL project. It was a real blast from the past, to see those first commits of my own flow into the Greenplum repository!

It's a healthy reminder that whatever shortcuts you might be tempted to take, they will come back to haunt you! Fortunately, my fellow PostgreSQL committers are hawk-eyed, and the PostgreSQL commit history is very clean and pleasant to work with.

Do you foresee more collaboration between PostgreSQL and Greenplum in the future?

Yes! As we continue to catch up Greenplum with more recent PostgreSQL versions, the friction of collaboration gets smaller and smaller. In the last couple of years, PostgreSQL has gotten a lot of the basic infrastructure that Greenplum relies on for data distribution, like partitioning and parallelism. That reduces the manpower needed in Greenplum to maintain those features as addons, and frees up developers to work on other things.

As we plan for new Greenplum features, we always try to design them in a way that works well with PostgreSQL, and if applicable, develop them in the PostgreSQL community first. That benefits the PostgreSQL community, by having the features, and it benefits Greenplum, by getting more eyes on the code earlier, which improves code quality.

Are there things that you feel that PostgreSQL can learn from Greenplum? What about Greenplum from PostgreSQL?

PostgreSQL can learn a lot from the features that are in Greenplum, but not yet in PostgreSQL. Usually, the code is not directly applicable, and Greenplum might have made different tradeoffs than the PostgreSQL community wants. But it is nevertheless very useful to look at existing implementations for inspiration, and to learn from the mistakes.

Pivotal has a well-established process for making minor Greenplum releases, emergency bug fixes and such. But between Greenplum 4, and Greenplum 5, the first open source version of Greenplum, there was a long gap. With Greenplum 5, we had to re-learn how to make a major release. PostgreSQL, on the other hand, has maintained a very stable and predictable release process for over 15 years, with roughly annual major version releases, and a 5 year support period for each major version. We are trying to get to a similar stable, predictable, schedule with Greenplum as well.

What challenges have you faced as you continue to push Greenplum toward code parity with PostgreSQL?

At first, we spent a lot of time on just cleaning up the Greenplum codebase. Throughout the PostgreSQL 8.3 merge, which was the first major version upgrade we went through, we ironed out tons of trivial differences between the PostgreSQL and Greenplum code that had crept up over the years. Small changes in whitespace, comments, variable names, and such. Most were well-intended, and made sense on their own, but they hindered the merge.

We're mostly done with that kind of cleanup, and we now have an established process for merging a major PostgreSQL version. But each version has its own challenges. With the PostgreSQL 8.4 merge, for example, PostgreSQL got window functions, and we had to reconcile the existing Greenplum implementation, with the implementation we were getting from PostgreSQL.

With the on-going PostgreSQL 9.1 merge, we will get Foreign Data Wrappers into Greenplum. We will have to decide what it means to have a foreign table in an MPP context. Do you run the foreign table only in the master node? That's straightforward, but you will get no MPP benefits. Or do you have each data segment fetch their own slice of the foreign data? That requires extending Foreign Data Wrapper API, and we need to do that in a way that's compatible with the whole ecosystem of existing PostgreSQL data wrappers.

Mason Sharp, from Maputo Data, is actually giving a presentation on how Postgres-XL and Postgres-XC are distributing Foreign Data Wrappers. I'll be there! This is a great opportunity to work together on a common API, so that the same FDW extension will work consistently with PostgreSQL, as well as all the forks like Postgres-XL and Greenplum.

Are there any specific goals you would like to highlight for collaboration with both communities over the next year?

Developers from EnterpriseDB announced plans to work on a new heap format called "zheap", for PostgreSQL v12. It would address many of the problems with "vacuuming" large tables. Vacuuming is cumbersome, when you scale up to hundreds of terabytes of data or more. Greenplum has largely solved that problem with a custom storage format called Append-Optimized Tables. But we would prefer to not maintain a custom storage format, we'd rather focus on making Greenplum better on MPP things, like parallelizing queries across a cluster. So we will be looking closely at the development of zheap, and want to help.

What sessions are you most excited about attending at PostgresConf US 2018?

I'm looking forward to hear stories from Greenplum customers, how they use the product, what problems they have. I don't speak enough to users! It's easy to lose sight of what day-to-day problems DBAs and application developers face.

I'm also excited about the career fair on Friday. I'm hoping to meet many new colleagues and future PostgreSQL developers there!

What is your favorite aspect of PostgresConf US?

It's my first time, so we'll see! :-) I go to many PostgreSQL developer-oriented conferences, to meet developer colleagues, and talk about upcoming features and engineering issues. In this conference, I'm hoping to hear more from DBAs and users.

Any final thoughts?

If you want to hear more war stories on Greenplum or PostgreSQL development, or have a weird PostgreSQL issue you want to show, or just want to say "hi!", come speak to me! You'll find me loitering around the halls.

 About Pivotal:

Pivotal drives software innovation for many of the world’s most admired brands. With millions of users in communities around the world, Pivotal technology touches billions of users every day.

Pivotal is the maker of Pivotal Greenplum, the world’s first fully-featured, multi-cloud, massively parallel processing (MPP) data analytics platform based on the open source Greenplum Database and Postgres. Pivotal Greenplum provides comprehensive and integrated analytics on multi-structured data. Powered by the world’s most advanced cost-based query optimizer, Pivotal Greenplum delivers unmatched analytical query performance on massive volumes of data.

PostgreSQL is the best open source operational (OLTP) database on the planet, but many PostgreSQL users are forced to work with proprietary analytical databases (e.g. Oracle or Teradata) for their data warehousing and big data workloads. Greenplum Database offers a proven path of migration from expensive and proprietary alternatives to the Postgres ecosystem.

Pivotal at PostgresConf US:

Heikki will be presenting "Greenplum Overview for Postgres Hackers" on Wednesday, April 18, at 10:30 am. Check out all the Greenplum Summit and related content. Stop by and visit the Pivotal team in the Exhibit Hall on Wednesday, April 18, and Thursday, April 19, in the Newport Ballroom, as well as at the Talent Exchange & Career Fair on Friday, April 20, from 10:30 am - 1:30 pm in the Newport Foyer on the 3rd floor.

Check out the full schedule for PostgresConf US 2018, and buy your tickets soon!

As part of the countdown to PostgresConf US 2018, learn more about about featured Platinum Sponsor Microsoft, including their commitment to partnering with and contributing to the Postgres community:

 

You are newer in the Postgres community. Tell us how you contribute (or how you plan to).

We are excited to be working with PostgreSQL community. We would love to partner with the community to bring our experience, from building SQL Server over the years, to PostgreSQL – and to learn in areas where PostgreSQL excels. We have already engaged on pgsql-hackers mailing list and working with the community on patches. Moving forward, we will continue to contribute back and partner with the community in the service of our customers. As we look forward, the possibilities of what we can work together on are amazing.

How do you foresee yourself helping the Postgres community?

As mentioned above, we would love to share our learnings from working on SQL Server with the PostgreSQL community. While there are many areas that we can work on together with the community, a couple of areas to highlight would be connectivity for the cloud and making PostgreSQL more robust and compatible in Windows development environment.

What challenges did you face building AzureDB?

A key learning for us while working on Azure Database for PostgreSQL has been that the fundamental needs of the CIO from any database in the cloud is quite similar – cost saving, fundamentals like reliability, performance and scale, as well as security. In the 9 months between preview and general availability, we heard similar feedback again and again from customers and worked on these key areas. For example, we ensures that there is built-in HA so developers can be confident of their customer experience. Similarly we ensure that we have worldwide but also local compliance to serve customer needs across the globe.  

What goals do you have for the Postgres community?

Microsoft’s mission is to enable every person and every organization on the planet to achieve more. To make this mission meaningful for our customers, we intend to meet them where they are, helping them to be productive with the technologies and tools of their choice. PostgreSQL has a strong community and is one of the most loved open source databases, bringing industry leading innovations to customers.

What is the number one barrier you see to contributing to the Postgres community?

For us this is start of an important and enduring journey and so far we have had great support from the community.

What is the best thing about working with the Postgres community?

PostgreSQL is a global community with talented engineers. So the best things about working with the community is the learning and sharing of experiences with the some great minds.

Tell us why you believe people should attend PostgresConf 2018 in April.

Because it is the best place to learn, interact, and network with everyone working on Postgres – either building Postgres or users of Postgres.

About Microsoft:

Microsoft's mission is to empower every person and every organization on the planet to achieve more. With its Data & AI solutions, Microsoft enables developers to easily build and deliver intelligent apps by offering productive and familiar tools to integrate data and built-in AI. To offer more choice and flexibility to developers, Microsoft has now introduced Azure Database for PostgreSQL, a PaaS offering for PostgreSQL.

Mark Bolz, Principal Program Manager with the Microsoft Azure Data Group, presents "General Data Protection Regulation (GDPR) with Azure Database for PostgreSQL" on Wednesday, April 18 at 4:30 pm. Principal Program Manager Sunil Kamath presents "Combine the Power of Community PostgreSQL and Microsoft Azure to Migrate Existing or Build New Apps" on Friday, April 20 at 12:50 pm -- see event listing for location (subject to change).

Rohan Kumar, Corporate Vice President of Azure Data at Microsoft, will present the Microsoft keynote on April 19 at 3:40 pm, in the Newport Grand Ballroom. Visit the Microsoft team in the Exhibit Hall in the Newport Grand Ballroom on Wednesday, April 18, and Thursday, April 19.  

Check out the full schedule for PostgresConf US 2018, and buy your tickets soon!

As part of the countdown to PostgresConf US 2018, learn more about the engaging content and our Diamond and Platinum sponsors for this year in our Sponsor Spotlight Series.

Brad Nicholson is a database engineer and the PostgreSQL team lead at Compose, an IBM Company, which is one of our Platinum Sponsors for PostgresConf US 2018. Compose runs a PostgreSQL as a service platform, and has long been a supporter of the Postgres community through contributions and support. Read what Brad has to say about Compose and Postgres:

Tell us about your commitment and contribution to the Postgres Community.

Postgres is a big part of our business, and one that is rapidly growing.  As such, our  commitment is pretty self-explanatory – we are committed to postgres and the PG Community. Basically, the better PG is/becomes, the better product we can build on top of it. Our biggest contribution to the community is probably Governor.  While we have since deprecated that project, Patroni is a fork of it, and uses the HA template we created with Governor. 


What particular challenges did you face when building multi platform deployments with Postgres?

Lack of management API was one of the biggest challenges.  This leads to less than desirable patterns like having to run Patroni and Postgres in the same container, effectively tying the lifecycle of the two together. There are also a number of places where log parsing is still required to ensure the validity of an operation (like ensuring PITR \ restores actually restored to the point you specified).  These sorts of patterns are challenging to handle and often lead to less than desirable architectural patterns at the platform level.

What growth pattern do you expect for yourself as well as Postgres as a whole?

I've been using Postgres since 2001. To watch its growth over the years has been impressive, especially over the past few years.

Postgres has long established itself as the number one Open Source RDBMS.  With the huge shift we have seen towards open source adoption in the past several years, I only see it's growth continuing to accelerate.

As an organizer of the Toronto Postgres User Group I'd personally like to get more involved in the community again.  I'm not a C developer, so advocacy and helping people out via the lists/slack/etc where I can.  Now that we've deprecated Governor, I'm also looking forward to contributing to the Patroni project more.

What features would you like to see in v11 and v12?

My number one ask is Failover Slots.  Without them, it makes it difficult to for us to give our customers access to Logical Replication and Logical Decoding.  We use streaming replication for HA, and abstract those details away from our users. A HA failover will break whatever systems are built on these constructs - we lose the replication slot that maintains the place in the decoding stream, most likely requiring a resync of the dependent systems.  That is not a great story for people building downstream systems.

The other thing I'd love to see is connection pooling in core.  This has been a huge Achilles heel in Postgres for ages. Pgbouncer and pgPool are nice products to help work around the limitations, but they are severely limited when it comes to multi-tenant systems.These systems frequently need to spread their connections out across multiple users and/or multiple databases within a given cluster. Because we can't share these connections via external pools, we end up with connection explosions. 

What is the number one barrier you see to contributing to the Postgres community?

 Not a very exciting answer, but time. There just aren't enough hours in the day to fit everything in.

What is the best thing about working with the Postgres community?

How helpful, open and responsive the people in the Community are.  When you have a question or problem, getting direct access to people via the mailing lists, Slack, etc is great.  Often you'll be talking with the folks that wrote the code in the first place.  People are always really helpful.

Tell us why you believe people should attend PostgresConf 2018 in April.

This conference is an amazing opportunity to learn about all sort of different areas from the experts.  Meeting folks face to face is always another huge benefit.

Visit the Compose team in the Exhibit Hall in the Newport Grand Ballroom on Wednesday, April 18, and Thursday, April 19.  IBM Senior Developer Advocate Raj Singh will present  "Do data science and machine learning with Postgres on the IBM Cloud" in his keynote on April 19 at 3 pm, which also takes place in the Newport Grand Ballroom.

Check out the full schedule for PostgresConf US 2018, and buy your tickets soon!