Organizations worldwide are dealing with growing volumes of data and distributed data lakes across different cloud platforms. Google launched BigQuery Omni to help customers break down these data silos, with the help of several cross-cloud analytics capabilities including cross-cloud transfer and cross-cloud transform (create table as a select). Already, in the past six months, BigQuery Omni saw over 120% growth in data processed by customers while querying across AWS, and Azure environments.
As customers use Omni more and more, they tell us they want to stitch data between different cloud platforms to solve complex use cases. Here are a few examples:
- As a marketing analyst in a large telecom provider, I want to join my Google Cloud digital assets data with my Azure customer info to optimize my loyalty programs and promotions.
- As a data analyst in a growing SaaS organization, my business reporting requires joining data from AWS with my line-of-business data in BigQuery. If I can do this without copying data via an ETL to BigQuery, which is a long process that involves multiple steps and is complex, AND, if I am able to do this with ease of consumption, lower cost and increased data freshness, that would be amazing.
- As a marketing user in a consumer digital native company, I am interested in joining my AdsWhiz data in AWS with audience data in Google Cloud to increase audience reach. I want to do this with ease and avoid data duplication.
- As an analyst in a financial services company, I am interested in a unified view of transaction data in AWS and data assets in Google Cloud. Data movement across clouds is very tricky, and security and trust boundaries are very critical.
Today, Google introducing a new cross-cloud join feature in BigQuery Omni that allows users to query data across clouds in a single SQL statement. Until today, customers were unable to direct JOIN or perform other DML operations between their BigQuery table and data lakes in AWS S3 or Azure Blob Store; instead, they had to copy data from these different cloud platforms to Google Cloud. The new cross-cloud joins capability makes it simple and easy to join and analyze data across clouds.
Cross-cloud joins benefits customers in many ways:
- Ability to do a direct join across clouds without having to explicitly create or materialize tables in BigQuery and then perform a join operation
- Simplicity, ease of use, and reduced costs of analyzing data distributed across cloud platforms
- Eliminates complexity of building and running pipelines for copying data from AWS or Azure data lakes via an ETL to BigQuery
Early users of this new feature are unlocking new use cases across different industries. Some notable ones include:
- Marketing analytics: A large retailer has media data on Azure and consumer data on Google Cloud. To get in-flight campaign metrics, they had to use traditional data exchange patterns such as file transfer, data ingestion and moving data from Azure to Google Cloud. With BigQuery Omni cross-cloud join functionality they are able to join with Azure data, while leveraging the petabyte-scale performance of BigQuery.
- Unified Data Platform: A large healthcare provider acquired another provider that had a lot of data on AWS, and to perform analytics, they were dealing with data silos and a split data-platform architecture between the Google Cloud and AWS architectures. Now, the ability to perform cross-cloud joins is drastically simplifying their architecture, allowing them to build a unified data platform with a single pane of glass.
- Cross-cloud analytics: A large gaming platform in Asia has two analytics platforms, each using BigQuery and AWS. Because both analytics platforms have substantial amounts of data, it is difficult for them to consolidate into one platform. To analyze gamers’ data and optimize gaming titles, their analysts leverage cross-cloud joins ad hoc whenever they need to build critical business insights.
Cross–cloud joins in action
It’s never been easier to join data across clouds and unlock the power of cross-cloud analytics. Let’s take an example of how simple and powerful cross-cloud joins can be:
Consider a retailer who wants to join customer data that’s available in Google Cloud with the orders data available in AWS, so they can get the statistics on customer orders within a given time period. With cross-cloud joins, getting that data is like any other simple SQL join statement, even though the tables are in different clouds.
SELECT
c.name AS customer_name,
o.priority AS order_priority,
COUNT(o.id) AS num_orders
FROM us_dataset.customers c
INNER JOIN aws_dataset.orders o ON o.customer_id = c.id
WHERE
c.country = 'US' AND
o.date BETWEEN DATE '2020-01-01' AND DATE '2020-02-01'
GROUP BY c.name, o.priority
ORDER BY 3 DESC;
What if we want to further break down the data by shipment mode? And the line items table that contains this data is in Azure? It’s still a simple SQL join statement:
SELECT
c.name AS customer_name,
o.priority AS order_priority,
i.shipment_mode,
COUNT(i.id) AS num_line_items
FROM us_dataset.customers c
INNER JOIN aws_dataset.orders o ON o.customer_id = c.id
INNER JOIN azure_dataset.lineitems i ON i.order_id = o.id
WHERE
c.country = 'US' AND
o.date BETWEEN DATE '2020-01-01' AND DATE '2020-02-01'
GROUP BY c.name, o.priority, i.shipment_mode
ORDER BY 4 DESC;
To learn how to get started with the BigQuery Omni cross-cloud joins feature, watch this demo.