So long data silos: Announcing BigQuery Omni cross-cloud joins

Organizations worldwide are dealing with growing volumes of data and distributed data lakes across different cloud platforms. Google launched BigQuery Omni to help customers break down these data silos, with the help of several cross-cloud analytics capabilities including cross-cloud transfer and cross-cloud transform (create table as a select). Already, in the past six months, BigQuery Omni saw over 120% growth in data processed by customers while querying across AWS, and Azure environments.

As customers use Omni more and more, they tell us they want to stitch data between different cloud platforms to solve complex use cases. Here are a few examples:

Today, Google introducing a new cross-cloud join feature in BigQuery Omni that allows users to query data across clouds in a single SQL statement. Until today, customers were unable to direct JOIN or perform other DML operations between their BigQuery table and data lakes in AWS S3 or Azure Blob Store; instead, they had to copy data from these different cloud platforms to Google Cloud. The new cross-cloud joins capability makes it simple and easy to join and analyze data across clouds.

Cross-cloud joins benefits customers in many ways:

Early users of this new feature are unlocking new use cases across different industries. Some notable ones include:

Cross–cloud joins in action

It’s never been easier to join data across clouds and unlock the power of cross-cloud analytics. Let’s take an example of how simple and powerful cross-cloud joins can be:

Consider a retailer who wants to join customer data that’s available in Google Cloud with the orders data available in AWS, so they can get the statistics on customer orders within a given time period. With cross-cloud joins, getting that data is like any other simple SQL join statement, even though the tables are in different clouds.

SELECT
       c.name AS customer_name,
       o.priority AS order_priority,
       COUNT(o.id) AS num_orders
FROM us_dataset.customers c
       INNER JOIN aws_dataset.orders o ON o.customer_id = c.id
WHERE
       c.country = 'US'  AND 
       o.date BETWEEN DATE '2020-01-01' AND DATE '2020-02-01'
GROUP BY c.name, o.priority
ORDER BY 3 DESC;

What if we want to further break down the data by shipment mode? And the line items table that contains this data is in Azure? It’s still a simple SQL join statement:

SELECT
       c.name AS customer_name,
       o.priority AS order_priority,
       i.shipment_mode,
       COUNT(i.id) AS num_line_items
FROM us_dataset.customers c
       INNER JOIN aws_dataset.orders o ON o.customer_id = c.id
       INNER JOIN azure_dataset.lineitems i ON i.order_id = o.id
WHERE
       c.country = 'US'  AND 
       o.date BETWEEN DATE '2020-01-01' AND DATE '2020-02-01'
GROUP BY c.name, o.priority, i.shipment_mode
ORDER BY 4 DESC;

To learn how to get started with the BigQuery Omni cross-cloud joins feature, watch this demo.

Related posts

Announcing PSP’s cryptographic hardware offload at scale is now open source

by Cloud Ace Indonesia
2 years ago

Transform your unstructured data with AI using BigQuery object tables, now GA

by Cloud Ace Indonesia
11 months ago

Migrate databases to Google Cloud VMware Engine (GCVE)

by Kartika Triyanti
2 years ago