Cloud Pub/Sub announces General Availability of exactly-once delivery

Today the Google Cloud Pub/Sub team is excited to announce the GA launch of exactly-once delivery feature. With this availability, Pub/Sub customers can receive exactly-once delivery within a cloud region and the feature provides following guarantees:

This blog discusses the exactly-once delivery basics, how it works, best practices and feature limitations.

Duplicates 

Without exactly-once delivery, customers have to build their own complex, stateful processing logic to remove duplicate deliveries. With the exactly-once delivery feature, there are now stronger guarantees around not delivering the message while the acknowledgment deadline has not passed. It also makes the acknowledgement status more observable by the subscriber. The result is the capability to process messages exactly once much more easily. Let’s first understand why and where duplicates can be introduced. 

Pub/Sub has the following typical flow of events:

  1. Publishers publish messages to a topic.
  2. Topic can have one or more subscriptions and each subscription will get all the messages published to the topic.
  3. A subscriber application will connect to Pub/Sub for the subscription to start receiving messages (either through a pull or push delivery mechanism).

In this basic messaging flow, there are multiple places where duplicates could be introduced. 

Publisher

Subscriber

Pub/Sub

It should be noted that there are clear differences between a valid redelivery and a duplicate:

Exactly-once side effects

“Side effect” is a term used when the system modifies the state outside of its local environment. In the context of messaging systems, this is equivalent to a service being run by the client that pulls messages from the messaging system and updates an external system (e.g., transactional database, email notification system). It is important to understand that the feature does not provide any guarantees around exactly-once side effects and side effects are strictly outside the scope of this feature.

For instance, let’s say a retailer wants to send push notifications to its customers only once. This feature ensures that the message is sent to the subscriber only once and no redelivery occurs either once the message has been successfully acknowledged or it is outstanding. It is the subscriber’s responsibility to leverage the email notification system’s exactly-once capabilities to ensure that message is pushed to the customer exactly once. Pub/Sub has neither connectivity nor control over the system responsible for delivering the side effect, and hence Pub/Sub’s exactly-once delivery guarantee should not be confused with exactly-once side effects.

How it works

Pub/Sub delivers this capability by taking the delivery state that was previously only maintained in transient memory and moving it to a massively scalable persistence layer. This allows Pub/Sub to provide strong guarantees that no duplicates will be delivered while a delivery is outstanding and no redelivery will occur once the delivery has been acknowledged. Acknowledgement IDs used to acknowledge deliveries have versioning associated with them and only the latest version will be allowed to acknowledge the delivery or change the acknowledge deadline for the delivery. RPCs with any older version of the acknowledgement ID will fail. Due to the introduction of this internal delivery persistence layer, exactly-once delivery subscriptions have higher publish-to-subscribe latency compared to regular subscriptions.

Let’s understand this through an example. Here Google have a single publisher, publishing messages to a topic. The topic has one subscription, for which Google have three subscribers.

Now let’s say a message (in blue) is sent to subscriber#1. At this point, the message is outstanding, which means that Pub/Sub has sent the message, but subscriber#1 has not acknowledged it yet. This is very common as the best practice is to process the message first before acknowledging it. Since the message is outstanding, this new feature will ensure that no duplicates are sent to any of the subscribers. 

The persistent layer for exactly-once delivery stores a version number with every delivery of a message, which is also encoded in the delivery’s acknowledgement ID. The existence of an unexpired entry indicates there is already an outstanding delivery and that we should not deliver a message (providing the stronger guarantee around the acknowledgement deadline). An attempt to acknowledge a message or modify its acknowledgement deadline with an acknowledgement ID that does not contain the most recent version can be rejected and a useful error message can be returned to the acknowledgement request.

Coming back to the example, a delivery version for the delivery of message M (in blue) to subscriber#1 will be stored internally within Pub/Sub (let’s call it delivery#1). This would track that a delivery of message M is outstanding. Subscriber#1 successfully processes the message and sends back an acknowledgement (ACK#1). The message is then removed eventually from Pub/Sub (pertaining to the topic’s retention policy). 

Now let’s consider a scenario that could potentially generate duplicates and how Pub/Sub’s exactly-once delivery feature guards against such failures.

An example

In this scenario, subscriber#1 gets the message and processes it by locking a row on the database. The message is outstanding at this point and an acknowledgement has not been sent to Pub/Sub. Pub/Sub knows through its delivery versioning mechanism that a delivery (delivery#1) is outstanding with subscriber#1.

Without the stronger guarantees provided by this feature, a message could be redelivered to the same or a different subscriber (subscriber#2) while it is still outstanding. This would cause subscriber#2 trying to get a lock on the database for the update, resulting in multiple subscribers trying to get locks for the same row, causing processing delays.

Exactly-once delivery eliminates this situation. Due to the introduction of the data deduplication layer, Pub/Sub knows that there is an outstanding delivery#1 which is unexpired and it should not deliver the same message to this subscriber (or any other subscriber).

Using exactly-once delivery

Simplicity is a key pillar of Pub/Sub. We have ensured that the feature is really easy to use. You can create a subscription with exactly-once delivery using the Google Cloud console, the Google Cloud CLI, client library, or Pub/Sub API. Please note that only pull subscription type supports exactly-once delivery, including subscribers that use the StreamingPull API. This documentation section provides more details on creating a pull subscription with exactly-once delivery.

Using the feature effectively

  1. Consider using our latest client libraries to get the best feature experience.
  2. You should also use new interfaces in the client libraries that allow you to check the response for acknowledgements. Successful response will guarantee no redelivery.
  3. To reduce network related ack expirations, leverage minimum lease extension setting : Python, Node.js, Go (MinExtensionPeriodin)

Limitations

  1. Exactly-once delivery is a regional feature. That is, the guarantees provided only apply for subscribers running in the same region. If a subscription with exactly-once delivery enabled has subscribers in multiple regions, they might see duplicates.
  2. For other subscription types (push and BigQuery), Pub/Sub initiates the delivery of messages and uses the response from the delivery as an acknowledgement; the message receiver has no way to know if the acknowledgement was actually processed. In contrast, pull subscriber clients initiate acknowledgement requests to Pub/Sub, which respond with whether or not the acknowledgement was successful. This difference in delivery behavior means that exactly-once semantics do not align well with non-pull subscriptions.

Related posts

Move over 2022: Predictions from Google Cloud experts that will reshape IT

by Cloud Ace Indonesia
2 years ago

How a Green Energy Provider Used Dataplex for its Data Governance and Quality

by Cloud Ace Indonesia
2 years ago

Secure and privacy-centric sharing with data clean rooms in BigQuery

by Cloud Ace Indonesia
2 years ago