Cloud SQL is excited to announce support for PostgreSQL 15. PostgreSQL 15 has many new and valuable enhancements and we thought it would be great to deep dive into these for our users. One can broadly categorize the improvements into five areas:

  • Security
  • Developer experience
  • Performance
  • Tooling
  • Observability

Let’s deep dive.

Security 

Schema `public` is now private by default

While historically PostgreSQL has had good security defaults when it comes to connecting to the database, the default for new databases was to have the schema ‘public’ open to everybody. Also the PostgreSQL superuser who created the cluster was the owner of the schema ‘public’, so if the database owner was not a superuser they were unable to change the schema access grants or drop the schema. 

Starting PostgreSQL 15 this default access is set to “database owner only” and also the schema is owned by the database owner. This is done using the role pg_database_owner which always resolves to the owner of the current database and not through changing the owner of the schema “public” during database creation.

If the database is upgraded from the older version via pg_upgrade or loaded from a pg_dump archive, the access rights stay as they were in the previous version of PostgreSQL.

Also, nothing has changed for the pseudo-role ‘public’ which still means “all users”.

SECURITY INVOKER Views

Another security-related improvement is a new option for a view to *not* act as a security provider where any user who has SELECT rights on the view will automatically have the rights of the owner of the view for any data accessed through this view.

This option is activated using option ‘security_invoker’ when creating the view.

CREATE VIEW … WITH ( security_invoker ) AS <query>;

Conceptually this is similar to SECURITY INVOKER functions, except that the default for functions in PostgreSQL has always been SECURITY INVOKER and SECURITY DEFINER had to be explicitly specified. For views the default is reversed.

Another difference from functions is that in the function call of a SECURITY DEFINER function, the system fully switches to the security context of the definer / owner of the function and anything accessed inside the function will have privileges of the definer.

For the views the behavior is different — even if the main view is not defined as ‘security_invoker’ any sub-views used by that view can still be and so any object access in such views will be checked against callers privileges.

More flexible security in logical replication

Logical replication can now also run as the subscription owner, in which case some extra checks are performed against this user’s privileges. When running this way the replication user must be either a superuser, table owner, or have bypassrls set to be able to replicate into a table with row-level security activated. The user also must have the SELECT privilege for a table in which it tries to replicate UPDATE or DELETE events.

New role to run CHECKPOINT manually

Before PostgreSQL 15, only superusers could run the CHECKPOINT; command. Now any user granted the pre-defined role pg_checkpoint can too.

This is part of an on-going push for more fine-grained control of who can do what.

Other similar predefined roles added over last few versions granting previously superuser-only abilities are pg_read_all_settingspg_read_all_statspg_stat_scan_tables and pg_signal_backend.

Developer experience

MERGE

PostgreSQL now provides the SQL standard command MERGE for delegating to the database the decision of whether to INSERT a new or UPDATE an existing row.

As an example let’s do the famous FizzBuzz exercise using MERGE.

1. Create a Table with all numbers from 0 to 15 which are multiples of 3 with line number in field “i” and string ‘fizz’ as value of field “say”

CREATE TABLE fizzbuzz AS
SELECT i, (CASE WHEN i % 3 = 0 THEN 'fizz' ELSE i::text END) as say 
  FROM generate_series(0,15,3) f(i);
SELECT *  FROM fizzbuzz ORDER BY 1;
i  | say 
----+------
 0 | fizz
 3 | fizz
 6 | fizz
 9 | fizz
12 | fizz
15 | fizz
(6 rows)

2. Next let’s MERGE in new lines with line number and ‘buzz’ in the “say” field

And let’s remove any lines where i <= 0

WITH buzzes AS (
     SELECT i,
            (CASE WHEN i % 5 = 0 THEN 'buzz' ELSE i::text END) as say
       FROM generate_series(0,15) f(i)
 )
 MERGE INTO fizzbuzz f
 USING buzzes b
 ON f.i = b.i
 WHEN NOT MATCHED THEN
   INSERT (i, say)
   VALUES(i, say)
 WHEN MATCHED AND f.i <= 0 THEN
   DELETE
 WHEN MATCHED AND b.say = 'buzz' THEN
   UPDATE SET say = f.say || b.say
 ;
SELECT *  FROM fizzbuzz ORDER BY 1;
 i |   say   
---+----------
 1 | 1
 2 | 2
 3 | fizz
 4 | 4
 5 | buzz
 6 | fizz
 7 | 7
 8 | 8
 9 | fizz
10 | buzz
11 | 11
12 | fizz
13 | 13
14 | 14
15 | fizzbuzz
(15 rows)

PostgreSQL already has a way to do some of this in the form of INSERT … ON CONFLICT DO INSTEAD … but this is a non-standard PostgreSQL-specific extension and it also has some “interesting” handling of transaction isolation levels where the behavior there does not follow exactly any of the SQL-standard isolation levels.

Also it depends on Unique Key violations, and it can not do multiple conditional actions.

So even after adding the PRIMARY KEY to the table, we still can’t remove the row 0 :

CREATE TABLE oldfizzbuzz AS
SELECT i, (CASE WHEN i % 3 = 0 THEN 'fizz' ELSE i::text END) as say FROM generate_series(0,15,3) f(i);
ALTER TABLE oldfizzbuzz ADD PRIMARY KEY (i);
WITH buzzes AS (
   SELECT i,
          (CASE WHEN i % 5 = 0 THEN 'buzz' ELSE i::text END) as say
     FROM generate_series(0,15) f(i)
)
INSERT INTO oldfizzbuzz
SELECT * FROM buzzes
ON CONFLICT (i) DO
UPDATE SET say = 'fizzbuzz'
WHERE EXCLUDED.say = 'buzz';
SELECT *  FROM oldfizzbuzz ORDER BY i;
 i  |   say    
----+----------
  0 | fizzbuzz
  1 | 1
  2 | 2
  3 | fizz
  4 | 4
  5 | buzz
  6 | fizz
  7 | 7
  8 | 8
  9 | fizz
 10 | buzz
 11 | 11
 12 | fizz
 13 | 13
 14 | 14
 15 | fizzbuzz
(16 rows)

Conclusion: MERGE is much more versatile and powerful than the old way!

Multirange improvements

One of the big improvements in PostgreSQL 14 was support for multirange types. Multirange is a set of non-overlapping ranges and they were needed to make support for range types complete. 

For example before multiranges you could not add range(1,3) + range(4,7) as the result was not a single range. With multirange you can. And you can then add range(2,5) to the previous range to get back to a single range(1,7).

When multiranges were added to PostgreSQL 14, somehow the support for the range union aggregator function rang_agg() was left out.

This was fixed in PostgreSQL 15, so now you can:

WITH data(name, rval) AS 
(VALUES ('Bob','[1,3)'::int4range)
      , ('Bob','[4,7)')
      , ('Jim','[4,7)')
      , ('Jim','[1,6)')
      , ('Tom','[1,3)')
      , ('Tom','[3,5)')
      , ('Tom','[5,7)')
)
SELECT name, range_agg(rval)
  FROM data
 GROUP BY 1;
name |   range_agg  
------+---------------
Tom  | {[1,7)}
Bob  | {[1,3),[4,7)}
Jim  | {[1,7)}
(3 rows)

ICU collations can be set as the default for clusters and databases 

Previously, only libc-based collations could be selected at the cluster and database levels. ICU collations could only be used via explicit COLLATE clauses.

Now you can do the following to have new database use a specified ICU locale:

CREATE DATABASE test_icu_collation
    LOCALE_PROVIDER 'icu'
    ICU_LOCALE 'fr-LU-x-icu'
    TEMPLATE template0
;
\l test_icu_collation
List of databases
-[ RECORD 1 ]-----+-------------------
Name              | test_icu_collation
Owner             | postgres
Encoding          | UTF8
Collate           | en_US.UTF8
Ctype             | en_US.UTF8
ICU Locale        | fr-LU-x-icu
Locale Provider   | icu
Access privileges |

New implementation of CREATE DATABASE

The CREATE DATABASE command was rewritten to WAL-log all the writes it does when it makes a new database as a copy of the template database.

It does much more WAL writing than the old version but as it avoids the CHECKPOINT at the start and end of the command, it is in most cases faster and has less impact on concurrent workloads.

This can be slower than the old version in case of a very large template database — for example in a multi-tenant cluster where the template has lot of schemas, tables and initial data — so the old way of doing it is still available and can be selected by specifying STRATEGY = FILE_COPY in the CREATE DATABASE command. The default of STRATEGY = WAL_LOG is the better one to use in most cases.

Performance 

There are more but these are the most interesting new features.

Faster sorting

First, the handling of cases where the sorted data did not fit in work_mem is improved by switching to disk-based sorting with more sort streams.

More cases where sorting can be avoided

Second, improvement for sorting is the ability to allow ordered scans of partitions to avoid sorting in more cases than before so sorting can be replaced by already pre-ordered index scans.

Previously, a partitioned table with a DEFAULT partition or a LIST partition containing multiple values could not be used for ordered partition scans. Now they can be used if such partitions are pruned during planning.

Smarter postgres_fdw

Postgres_fdw is a “foreign data wrapper” which allows exposing tables from other PostgreSQL databases as local tables.

In PostgreSQL 15 there are a few new options:

First, now the query optimizer can send CASE expressions to be executed in the foreign database, lowering the need to fetch more data or even more rows for local processing.

There already was support for pushdown of simpler filters and joins when the wrapper could prove that it was possible to process them fully on the remote side. This, together with the ability to have foreign tables as partitions of local partitioned tables, opens up more ways to use PostgreSQL with distributed data.

Another new feature related to above is the ability to do commits in all foreign servers involved in a transaction in parallel. This will be really helpful in cases of large numbers of foreign tables, which can easily happen in the case of partitioned tables with foreign partitions. This is enabled with the CREATE SERVER option parallel_commit.

Yet another new option, this time not performance related, for foreign tables is postgres_fdw.application_name, which allows setting the application_name used when establishing connections to foreign servers. This lets DBAs and users easily see which connections are opened by postgres_fdw. There are even escape sequences available for customization of the application_name used. Previously the remote session’s application_name could only be set on the remote server or via a postgres_fdw connection specification.

New options in logical replication

Native logical replication has been improved in multiple ways.

First, it now has support for row filtering and column lists.

While row filtering has a set of rules you have to follow for different replication strategies, at a high level, it is specified the same way as one would do for a query:

CREATE PUBLICATION pub2 FOR TABLE table1 WHERE (name like 'TX%');

And just rows who have TX in their name will be replicated.

Column lists work in a similar way, allowing one to specify a subset of table columns that are replicated:

CREATE PUBLICATION pub1 FOR TABLE table1 (id, a, c);

Also new is the option FOR TABLES IN SCHEMA, which publishes all current and future tables in a specified schema. Earlier the ALL option was available only database-wide.

And we now have support for proper two-phase commits. For this the replication slot needs to be created with an option called TWO_PHASE. 

One sample user of this is pg_recvlogical, which has added a –two-phase option to be used during slot creation.

Logical replication also no longer sends empty transactions. When it finds that there are no DML statements in a decoded transaction for a certain slot, it sends nothing and moves directly on to the next transaction.

It also now detects the case of a partially streamed transaction which has crashed on source and sends info about this to the subscriber. Before, this case caused subscriber to keep such transactions open until the subscriber restarted.

There are now functions to monitor the directory contents of logical replication slots:

 pg_ls_logicalsnapdir()pg_ls_logicalmapdir(), and pg_ls_replslotdir()

They can be run by members of the predefined pg_monitor role. 

And although partitioned tables can have foreign tables as partitions, replicating into such a partition isn’t currently supported. The logical replication worker used to crash if it was attempted. Now, an error is thrown.

Comparison with the pglogical extension

While there have been lots of improvements, there are still cases where the pglogical extension is needed.

  • Native replication has no support for filtering by replication origin, meaning that setting up a bi-directional replication will fail, either resulting in an infinite loop for UPDATE, or in case of INSERT into a table with Primary Key, the replication stops with key violation when trying to replicate the same insert back. In case of insert-only publication, replication will keep inserting the new row over and over again, resulting in unlimited table growth.
  • You can’t define primary to have multiple IP addresses (pg_logical has the concept of “interfaces” for this). 

Bi-directional replication support is the most useful of the three. The others are for really rare use cases but perhaps worth mentioning in case you happen to have one of them.

And of course if you need some of the new options when replicating *into* PostgreSQL 15 from an older version, you also still need to use pg_logical, as PostgreSQL core only gets a new feature in latest version.This is different from extensions, where you can often use the latest extension version on many PostgreSQL versions.

Tooling

Improvements to pgbench 

The bundled performance testing tool pgbench can now retry serializability errors, including deadlocks. This is good news if you want to test workloads, which occasionally do deadlock or have other serialization violations that could be fixed by re-running the transaction. 

For example the standard TPC-C tests define that 10% of transactions are aborted. 

Now this should be possible to be tested using pgbench with custom scripts.

Improved psql experience

While psql is already quite amazing, PostgreSQL 15 managed to add even more features for advanced users.

Multi-statement commands

Now psql will return results for all statements in a multi-statement query string.

Pre-15 versions of psql emulated the behavior of sending the whole string to the server and returned only the result of the last statement, even though psql does parse the strings given to it and sends them as separate statements. Now results for each individual statement are returned. To get old behavior, set SHOW_ALL_RESULTS psql variable to off.

(The only way to ask psql to send “select 1; select 2; select 3;” as a single string is to escape the ;, so “select 1\; select 2\; select 3;” will be sent as a single string)

Faster \copy

Now the \copy command in psql uses larger chunks to send data thus improving the speed of the copy.

Easier way to show a set of server variables

A new command \dconfig is added to show server variables.

This can also handle wildcards, so now \dconfig *log* shows all variables with ‘log’ in their names.

Earlier you had to manually run 

SELECT name, settings unit FROM pg_settings WHERE name like '%log%' 

to get this.

Observability 

New statistics collection subsystem

The Cumulative Statistics System was rewritten to use shared memory.

In earlier versions there was a special statistics collector process that got the stats from individual back-ends via UDP packets. And the collected stats became available to back-ends after transferring them via file system.

The new system should:

  • be faster 
  • need less configuration
  • not randomly lose some collected statistics in case of high workloads (UDP is by design lossy), so counts in pg_stats_* views should be more trustworthy

Monitoring and new monitoring roles

A new statistics view pg_stat_subscription_stats is added for monitoring subscriptions.

Also a view pg_stat_recovery_prefetch which tracks pre-fetching in recovery.

Now pg_stat_statements has new fields for temporary file I/O and JIT counters.

And lastly there are two new server variables:

  • shared_memory_size to check the size of allocated shared memory 
  • shared_memory_size_in_huge_pages for the number of huge memory pages required 

Preparing for larger data volumes

As an interesting feature, the functions to pg_size_pretty() and pg_size_bytes() were updated to be able to convert to Petabytes. Before version 15 the largest unit they knew about was Terabytes :

select pg_size_pretty((2^54)::bigint);
pg_size_pretty
----------------
16 PB

Summary

PostgreSQL continues to innovate and deliver much needed features for the most demanding applications. PostgreSQL 15 is no different and we’re very excited to support it in Cloud SQL.