Posts tagged “Open Source”

Welcome to "Cultivating DEI" , a series in which Postgres community members share their insight and experience about creating a more diverse and inclusive Postgres environment where all are welcome.

Recently I've been thinking a lot about relationships between the PostgreSQL community and the Database research community. To put it bluntly – these two communities do not talk to each other!

There are many reasons why I am concerned about this situation. First, I consider myself belonging to both of these communities. Even if right now I am 90% in industry, I can't write off my academic past. Writing a scientific paper with the hope of being accepted to the real database conference is something which appeals to me.

Secondly, we want to have quality candidates for database positions. Anyone who has tried recently to fill these positions knows that this is not an easy task. If you are looking at recent college grads, there are almost no chances that you can find somebody who has PostgreSQL experience. Here is where we face the other side of the problem.

The problem is not simply that scientists do not speak at the PostgreSQL conferences, and that PostgreSQL developers do not speak at academic conferences. The larger issue is that for many Computer Sciences (CS) students, their academic research and practical experience do not intersect. They learn about some incredible algorithms, and as part of their coursework they may suggest some enhancements to existing algorithms. They then practice their SQL skills with MySQL, which from my observations lacks so many basic features, that it can hardly be taken seriously as a data platform.

If students practiced using PostgreSQL, they would have a full-scale enterprise ready object-relational database -- not a "light" version, but a robust platform, which supports a multitude of index and data types, constraints, procedural languages and much more.

I've heard from several professors that "MySQL is okay for "learning SQL." I want to ask -- what does "learning SQL" mean? Is it just learning how to write a syntactically correct SQL? One contributing factor to the problem is that MySQL comes on each laptop by default, integrated with basic tools that allow building websites. It is integrated with Wordpress. There is no reason for PostgreSQL not to have similar support, but it is not in place.

This is particularly frustrating when you recognize the amount of database research was completed using Postgres, for Postgres or with help of Postgres; R-Tree and GIST indexes, for example. Also, the SIGMOD Test of Time Award in 2018 went to the paper "Serializable isolation for snapshot databases," which was implemented in PostgreSQL.

I know the answer to the question "why do they not talk?" Researchers do not want to talk at the PostgreSQL conferences, because those are not scientific conferences, and participation in these conferences will not result in a publication. Postgres developers do not present at the CS conferences, because they do not want to write long papers. Even if they do submit something, their papers are often rejected as "not having any scientific value." I have experienced this on multiple occasions.

I came across another example of "why” when I attended the ACM/SIGMOD conference in Amsterdam. I attended a compelling presentation on the problem of cardinality estimation over multi-join queries, that introduced new optimization techniques. The presenter mentioned that he had used Postgres to build the prototype. I was too far back in the room to ask my question, so I reached out via the conference website.

I asked the presenter why he didn't submit a patch. He replied that their approach was hacky, and it needs more work to think about adding it to Postgres. I've asked whether he would be interested in working on it with some PostgreSQL community members. His reply? "Not in the next two years, I've just received a post-doc position at Microsoft, so I can't do it for the next two years."

So yes -- I know the answer as to why these two communities historically do not communicate. However, I do not like or accept it. Perhaps we can talk about and resolve this problem together?!

Contributor Bio:

Henrietta Dombrovskaya is a database researcher and developer with over 30 years of academic and industrial experience. She holds a Ph.D. in Computer Science from University of Saint Petersburg, Russia. She taught Database and Transaction theory at the University of Saint – Petersburg (Russia), as well as multiple database tuning classes for both beginners and advanced professionals.

Her professional experience includes consulting for a number of government projects in Chicago and New York, and providing Data services in the financial sector, manufacturing, and distribution. She is a co-author, with B. Novikov, of the book “System Tuning”, BHV, S.-Petersburg, Russia. Her researches in overcoming object-relational impedance mismatch were publish in the Proceedings of EDBT 2014 Athens and ICDE 2016 in Helsinki. At Braviant Holdings she is happy to have an opportunity to implement the results of her research in practice.

Henrietta Dombrovskaya is a co-organizer of the Chicago PostgreSQL User Group and a member of the Diversity, Equity, and Inclusion Work Group for the Postgres Conference Series. She was recently awarded the 2019 "Technologist of the Year" award by the Illinois Technology Association. This award is  "presented to the individual whose talent has championed true innovation, either through new applications of existing technology or the development of technology to achieve a truly unique product or service."

As part of the countdown to PostgresConf US 2018, learn more about the engaging content and our Diamond and Platinum sponsors for this year in our Sponsor Spotlight Series.

Brad Nicholson is a database engineer and the PostgreSQL team lead at Compose, an IBM Company, which is one of our Platinum Sponsors for PostgresConf US 2018. Compose runs a PostgreSQL as a service platform, and has long been a supporter of the Postgres community through contributions and support. Read what Brad has to say about Compose and Postgres:

Tell us about your commitment and contribution to the Postgres Community.

Postgres is a big part of our business, and one that is rapidly growing.  As such, our  commitment is pretty self-explanatory – we are committed to postgres and the PG Community. Basically, the better PG is/becomes, the better product we can build on top of it. Our biggest contribution to the community is probably Governor.  While we have since deprecated that project, Patroni is a fork of it, and uses the HA template we created with Governor. 


What particular challenges did you face when building multi platform deployments with Postgres?

Lack of management API was one of the biggest challenges.  This leads to less than desirable patterns like having to run Patroni and Postgres in the same container, effectively tying the lifecycle of the two together. There are also a number of places where log parsing is still required to ensure the validity of an operation (like ensuring PITR \ restores actually restored to the point you specified).  These sorts of patterns are challenging to handle and often lead to less than desirable architectural patterns at the platform level.

What growth pattern do you expect for yourself as well as Postgres as a whole?

I've been using Postgres since 2001. To watch its growth over the years has been impressive, especially over the past few years.

Postgres has long established itself as the number one Open Source RDBMS.  With the huge shift we have seen towards open source adoption in the past several years, I only see it's growth continuing to accelerate.

As an organizer of the Toronto Postgres User Group I'd personally like to get more involved in the community again.  I'm not a C developer, so advocacy and helping people out via the lists/slack/etc where I can.  Now that we've deprecated Governor, I'm also looking forward to contributing to the Patroni project more.

What features would you like to see in v11 and v12?

My number one ask is Failover Slots.  Without them, it makes it difficult to for us to give our customers access to Logical Replication and Logical Decoding.  We use streaming replication for HA, and abstract those details away from our users. A HA failover will break whatever systems are built on these constructs - we lose the replication slot that maintains the place in the decoding stream, most likely requiring a resync of the dependent systems.  That is not a great story for people building downstream systems.

The other thing I'd love to see is connection pooling in core.  This has been a huge Achilles heel in Postgres for ages. Pgbouncer and pgPool are nice products to help work around the limitations, but they are severely limited when it comes to multi-tenant systems.These systems frequently need to spread their connections out across multiple users and/or multiple databases within a given cluster. Because we can't share these connections via external pools, we end up with connection explosions. 

What is the number one barrier you see to contributing to the Postgres community?

 Not a very exciting answer, but time. There just aren't enough hours in the day to fit everything in.

What is the best thing about working with the Postgres community?

How helpful, open and responsive the people in the Community are.  When you have a question or problem, getting direct access to people via the mailing lists, Slack, etc is great.  Often you'll be talking with the folks that wrote the code in the first place.  People are always really helpful.

Tell us why you believe people should attend PostgresConf 2018 in April.

This conference is an amazing opportunity to learn about all sort of different areas from the experts.  Meeting folks face to face is always another huge benefit.

Visit the Compose team in the Exhibit Hall in the Newport Grand Ballroom on Wednesday, April 18, and Thursday, April 19.  IBM Senior Developer Advocate Raj Singh will present  "Do data science and machine learning with Postgres on the IBM Cloud" in his keynote on April 19 at 3 pm, which also takes place in the Newport Grand Ballroom.

Check out the full schedule for PostgresConf US 2018, and buy your tickets soon!

Latest Posts