Wednesday, December 31, 2014

pg_repack redux

I got an idea the other day about how to get over the recent problem I had encountered with pg_repack colliding with a lock request on the table made while it was copying data, but before it tried to rebuild indexes. After some considerable time with the debugger, I found that the problem is that pg_repack tries to get the index definition only at the time it builds the index, and that pg_get_indexdef() requires an Access Share lock on the table. The end result is that the lock request gets blocked waiting for pg_repack, and pg_repack gets blocked waiting for the lock requester, and there they stay. My initial solution was to move the index definition fetching to just after the place where the table definition is fetched. However, that ran into problems with existing strong locks, because it's before the brief time near the start of pg_repack's run where it obtains a strong lock on the table and cancels anything else holding such a lock. So I then moved it inside the section where it already holds a strong lock on the table. With that change, it now works both when there is an existing strong lock on the table and when there is a subsequent strong lock request on the table during the copy data phase. I've sent a pull request for this fix to the repo.

Friday, December 19, 2014

Waiting for extra statement stats

One of the things I was hoping would make it into PostgreSQL 9.4 and didn't was some extra statistics in pg_stat_statements. Specifically, it was proposed to add minimum, maximum and (after I proposed it and pressed the case) the standard deviation for execution times. Keeping these stats is very cheap, so there should be close to zero performance impact. They will give you a much better idea of the spread of execution times than a simple average, which might be significantly dominated by outlier times. Unfortunately, these got left out for 9.4, which is rather disappointing. I'm going to see if I can revive the proposal for 9.5.

Thursday, December 18, 2014

PostgreSQL 9.4 and beyond

PostgreSQL 9.4 is released today, and once again my main input has been in the JSON area. We've introduced a new spiffy JSON type called "jsonb" that makes performing operations on JSON much, much faster, because it's not stored as plain text, but rather decomposed into the structures and scalar values that go to make up JSON. Much of the credit for this goes to Oleg Bartunov and Teodor Sigaev, the original authors of the hstore module, who created a "nested hstore" module that we eventually based jsonb on. Credit also goes to Alexander Korotkov and Peter Geoghegan, especially for their indexing work, which will make many operations super fast. My role was to make sure that all (or almost all) of the operations that are available for the original json type are also available for the jsonb type.

There have been some benchmark tests that suggest that our jsonb is faster in many respects that other database engines that use json as their native storage.

The nice thing about this is that you can now have a highly performing mixture of traditional relational data and semi-structured treeish data with proper transactional semantics, all in one engine. That's a very attractive proposition for a lot of people looking to design new data stores.

My other JSON contribution was to add a number of functions that allow the construction of arbitrarily complex json documents, with data from the database. This will be particularly useful for clients that expect json in a particular, non-regular shape, and is based on the needs of some actual json-using customers.

Already committed for 9.5, to be released some time next year, are jsonb versions of those json functions we didn't provide jsonb versions of in 9.4, and functions to allow the stripping of object fields with 'null' as the value from json and jsonb values (this can result in much more compact storage in some cases, with no significant semantic loss).

I made a few other contributions to 9.4, but they are not worth mentioning in detail.

I'm now looking for new PostgreSQL projects, not involving JSON. I'm not sure what those should be. For the most part my contributions over the last 12 years have tended not to be in  the area that involves the planning and execution of queries, so maybe it's time for me to dive into that pool. But to do what I don't yet know.