Pub/Sub schema evolution is now GA
Pub/Sub schemas are designed to allow safe, structured communication between publishers and subscribers. In particular, the use of schemas provides that guarantee that any message published adheres to a schema and encoding, which the subscriber can rely on when reading the data.
Schemas tend to evolve over time. For example, a retailer is capturing web events and sending them to Pub/Sub for downstream analytics with BigQuery. The schema now includes additional fields that need to be propagated through Pub/Sub. Up until now Pub/Sub has not allowed the schema associated with a topic to be altered. Instead, customers had to create new topics. That limitation changes today as the Pub/Sub team is excited to introduce schema evolution, designed to allow the safe and convenient update of schemas with zero downtime for publishers or subscribers.
Schema revisions
A new revision of schema can now be created by updating an existing schema. Most often, schema updates only include adding or removing optional fields, which is considered a compatible change.
All the versions of the schema will be available on the schema details page. You are able to delete one or multiple schema revisions from a schema, however you cannot delete the revision if the schema has only one revision. You can also quickly compare two revisions by using the view diff functionality.
Topic changes
Currently you can attach an existing schema or create a new schema to be associated with a topic so that all the published messages to the topic will be validated against the schema by Pub/Sub. With schema evolution capability, you can now update a topic to specify a range of schema revisions against which Pub/Sub will try to validate messages, starting with the last version and working towards the first version. If first-revision is not specified, any revision <= last revision is allowed, and if last revision is not specified, then any revision >= first revision is allowed.
Schema evolution example
Let’s take a look at a typical way schema evolution may be used. You have a topic T that has a schema S associated with it. Publishers publish to the topic and subscribers subscribe to a subscription on the topic:
Now you wish to add a new field to the schema and you want publishers to start including that field in messages. As the topic and schema owner, you may not necessarily have control over updates to all of the subscribers nor the schedule on which they get updated. You may also not be able to update all of your publishers simultaneously to publish messages with the new schema. You want to update the schema and allow publishers and subscribers to be updated at their own pace to take advantage of the new field. With schema evolution, you can perform the following steps to ensure a zero-downtime update to add the new field:
1. Create a new schema revision that adds the field.
2. Ensure the new revision is included in the range of revisions accepted by the topic.
3. Update publishers to publish with the new schema revision.
4. Update subscribers to accept messages with the new schema revision.
Steps 3 and 4 can be interchanged since all schema updates ensure backwards and forwards compatibility. Once your migration to the new schema revision is complete, you may choose to update the topic to exclude the original revision, ensuring that publishers only use the new schema.
These steps work for both protocol buffer and Avro schemas. However, some extra care needs to be taken when using Avro schemas. Your subscriber likely has a version of the schema compiled into it (the “reader” schema), but messages must be parsed with the schema that was used to encode them (the “writer” schema). Avro defines the rules for translating from the writer schema to the reader schema. Pub/Sub only allows schema revisions where both the new schema and the old schema could be used as the reader or writer schema. However, you may still need to fetch the writer schema from Pub/Sub using the attributes passed in to identify the schema and then parse using both the reader and writer schema.
BigQuery subscriptions
Pub/Sub schema evolution is also powerful when combined with BigQuery subscriptions, which allow you to write messages published to Pub/Sub directly to BigQuery. When using the topic schema to write data, Pub/Sub ensures that at least one of the revisions associated with the topic is compatible with the BigQuery table. If you want to update your messages to add a new field that should be written to BigQuery, you should do the following:
1. Add the OPTIONAL field to the BigQuery table schema.
2. Add the field to your Pub/Sub schema.
3. Ensure the new revision is included in the range of revisions accepted by the topic.
4. Start publishing messages with the new schema revision.
With these simple steps, you can evolve the data written to BigQuery as your needs change.
Quotas and limits
Schema evolution feature comes with following limits:
- 20 revisions per schema name at any time are allowed.
- Each individual schema revision does not count against the maximum 10,000 schemas per project.