Spanner has emerged as a compelling choice for enterprises seeking to address the limitations of traditional databases. With the general availability and increasing popularity of Spanner PostgreSQL dialect, there is growing interest in migrating from PostgreSQL to Spanner.  Migrating from a PostgreSQL database to Spanner (PostgreSQL dialect) can bring significant benefits, including: 

  • Horizontal scalability
  • Strong consistency
  • 99.999% availability offering
  • Familiar syntax and semantics for PostgreSQL developers 
  • Elimination of maintenance tasks like VACUUMing, tuning shared buffers and managing connection pooling

While the benefits of Spanner are undeniable, migrating a production database can be a daunting task. Organizations fear downtime and disruptions to operations. Fortunately, with careful planning and the Spanner Migration Tool (SMT), it is possible to migrate from PostgreSQL to Spanner with minimal downtime. 

The complete migration process is in the official guide. This blog post demonstrates a minimal downtime migration of a sample application.

1. Configure source (PostgreSQL) and destination (Spanner) resources

Follow this guide to create a Cloud SQL for PostgreSQL database (named example) and this guide to set up a destination Spanner instance.

2. Set up sample application

This demo migration will be performed using Cloud Shell. Launch Cloud Shell and authenticate:

gcloud auth application-default login

The sample app is available in github:

git clone https://github.com/cloudspannerecosystem/spanner-migration-example.git
cd spanner-migration-example

This app works with a schema of three tables: singersalbums and songs. It performs periodic data inserts, to simulate a production like traffic.

Configure the app to connect to PostgreSQL. Gather the instance’s connection name from the Cloud SQL instance overview page. Assign the correct values below and run:

cat > .env <<EOF
CLOUDSQL_INSTANCE_CONNECTION_NAME=<connection-name>
CLOUDSQL_DATABASE=example
CLOUDSQL_USERNAME=postgres
CLOUDSQL_PASSWORD=<password>
EOF

Start the dockerized application:

docker-compose up app-cloudsql

This should insert records into Cloud SQL PostgreSQL. After verification, stop the application:

docker-compose down

3. Configure SMT

Follow this quickstart guide to launch SMT’s web UI and connect to Spanner by entering project-id and instance-id created in step#1.

To allow SMT to connect to the PostgreSQL database, follow this IP allowlisting guide.

To Connect to source Database, enter:

  • Hostname: PostgreSQL instance’s Public or Private IP
  • Port: 5432
  • PostgreSQL Database Name: example
  • Spanner Dialect: PostgreSQL
https://storage.googleapis.com/gweb-cloudblog-publish/images/1_-_Test_Connection.max-900x900.png

4. Configure and migrate schema

In the “Configure Schema” view, clicking on each table shows the schema for PostgreSQL and Spanner in a side-by-side comparison.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_-_Configure_schema.max-1000x1000.png

Spanner tables albums and songs can leverage interleaving for better query optimization. This can be done through the “INTERLEAVE” tab.

https://storage.googleapis.com/gweb-cloudblog-publish/images/3_-_Interleave_tab.max-1000x1000.png

Next, plan for all the issues displayed by SMT to ensure a smooth migration.

In Spanner all tables need a primary key. The singers table had no primary key, but it had a UNIQUE constraint on singer_id. SMT cleverly migrated that column to be our primary key.

https://storage.googleapis.com/gweb-cloudblog-publish/images/4_-_Primary_Key.max-900x900.png

Some columns have SERIAL data types (for IDs). Since Spanner does not support this, we can use SEQUENCEs. Spanner SEQUENCEs are not monotonically increasing to prevent hotspotting. We will need to manually create those later.

Spanner only supports 8 byte integrals, so int4/SERIAL columns were migrated to int8s. They will affect the refactoring of the application later. 

We can remove the identified redundant index by selecting it in the songs table and skipping its creation.

https://storage.googleapis.com/gweb-cloudblog-publish/images/5_-_Skip_index.max-1000x1000.png

With a plan to solve all issues at hand, save the work by clicking “Save Session”. Continue by clicking on “Prepare Migration”.

At first, we will only migrate the Schema to validate our application for Spanner. Enter:

  • Migration mode: Schema
  • Spanner database: example

Click on “Migrate”. We will get a link to the created database on completion.

https://storage.googleapis.com/gweb-cloudblog-publish/images/6_-_Migrate_schema.max-700x700.png

Finally, apply the schema updates that weren’t automated by SMT. Run statements from this file on the migrated database. You can do it through Spanner Studio.

5. Update application

We needed to do some minor updates to have our application working with Spanner, including:

  • Configuring PGAdapter for converting the PostgreSQL wire-protocol into its Spanner equivalent.
  • Updating queries, because Spanner supports jsonb in lieu of the json type.

To test the application against Spanner, Google configure it first:

cat >> .env <<EOF
SPANNER_PROJECT=$GOOGLE_CLOUD_PROJECT
SPANNER_INSTANCE=<spanner-instance-id>
SPANNER_DATABASE=example
PGADAPTER_HOST=pgadapter
PGADAPTER_PORT=5432
EOF

We then start it:

docker-compose up app-spanner

After verifying that the application has inserted a few rows successfully, we can stop it and delete the sample data.

docker-compose down
alias execute-sql="gcloud spanner databases execute-sql example --instance=<spanner-instance-id>"
execute-sql --sql="DELETE FROM songs"
execute-sql --sql="DELETE FROM albums"
execute-sql --sql="DELETE FROM singers"

6. Minimal downtime migration

SMT orchestrates the “Minimal Downtime Migration” process by:

  1. Loading initial data from the source database into the destination.
  2. Applying a stream of change data capture (CDC) events.
https://storage.googleapis.com/gweb-cloudblog-publish/images/7_-_SMT_diagram.max-2200x2200.png

SMT will set up:

  1. Cloud Storage bucket to store CDC events while the initial data loading occurs.
  2. Datastream job for bulk loading of CDC data and streaming incremental data to Storage bucket. 
  3. Dataflow job to migrate CDC events into Spanner. 

To allow SMT to deploy this pipeline, make sure you have all the necessary permissions.

Next, follow this guide to set up the source PostgreSQL CDC. You can then restart the application pointing to PostgreSQL to simulate live traffic:

docker-compose up app-cloudsql

Jump back to the SMT, resume the saved session and continue to “Prepare Migration”. Enter:

  • Migration Mode: Data 
  • Migration Type: Minimal downtime Migration.

Input the source PostgreSQL database details (example) and the destination Spanner database (example).

Input the connection profile for the source database (from the PostgreSQL CDC configuration) and follow the IP allowlisting instructions. 

Finally, set up the target connection profile and click on “Migrate” to start the migration.

https://storage.googleapis.com/gweb-cloudblog-publish/images/8_-_Migrate_data.max-700x700.png

SMT generates useful links for monitoring the migration.

7. Using Spanner

Stop the application that is using PostgreSQL:

docker-compose down

Wait for the Spanner to catch up, once Dataflow’s backlog reaches zero, switch to the Spanner application:

docker-compose up app-spanner

With that Google successfully completed our demo app migration using SMT!

Now you can click on “End Migration” and clean up the jobs.

Conclusion

Through careful planning and utilizing SMT, we can minimize downtime during the migration to Spanner, ensuring an efficient transition.