To help customers break down data silos, Google launched BigQuery Omni in 2021. Organizations globally are using BigQuery Omni to analyze data across cloud environments. Now, Google are excited to launch the next big evolution for multi cloud analytics: cross-cloud analytics. Cross-cloud analytics tools help analysts and data scientists easily, securely, and cost effectively distribute data between clouds to leverage the analytics tools they need. In April 2022, Google previewed a SQL supported LOAD statement that allowed AWS/Azure blob data to be brought into BigQuery as a managed table for advanced analysis. Google learned a lot in this preview period. A few learnings stand out:
- Cross-cloud operations need to meet analysts where they are. In order for analysts to work with distributed data, workspaces should not be siloed. As soon as analysts are asked to leave their SQL workspaces to copy data, set up permissions, or grant permission, workflows break down and insights are lost. Same SQL can be used to periodically copy data using BigQuery scheduled queries. The more of the workflow that can be managed by SQL, the better.
- Networking is an implementation detail, latency should be too. The longer an analyst needs to wait for an operation to complete, the less likely a complete workflow is to be completed end-to-end. BigQuery users expect high performance for a single operation, even if those operations are managed across multiple data centers.
- Democratizing data shouldn’t come at the cost of security. In order for data admins to empower data analysts and engineers, they need to be assured there isn’t additional risk in doing so. By default, data admins and security teams are increasingly looking for solutions that don’t persist user credentials between cloud boundaries.
- Cost control comes with cost transparency. Data transfer costs can get costly, and we hear frequently this is the number 1 concern for multi-cloud data organizations. Providing transparency into single operations and invoices in a consolidated way is critical to driving success for cross-cloud operations. Allowing administrators to cap costs for budgeting is a must.
This feedback is why Google spent much of this year improving our cross-cloud transfer product to optimize releases around these core tenants:
- Usability: The LOAD SQL experience allows for data filtering and loading within the same editor across clouds. LOAD SQL supports data formats like JSON, CSV, AVRO, ORC and PARQUET. With semantics for both appending and truncating tables, LOAD supports both periodic syncs and refreshing the complete table semantics. We’ve also added SQL support for data lake standards like Hive partitioning and JSON data type.
- Security: With a federated identity model, users don’t have to share or store credentials between cloud providers to access and copy their data. We also now support CMEK support for the destination table to help secure data as it’s written in BigQuery and VPC-SC boundaries to mitigate data exfiltration risks.
- Latency: With data movement managed by BigQuery Write API, users can effortlessly move just the relevant data without having to wait for complex pipes. We’ve improved job latency significantly for the most common load jobs and are seeing performance improvements with each passing day.
- Cost auditability: From one invoice, you can see all your compute and transfer costs for LOADs across clouds. Each job comes with statistics to help admins manage budgets.
During our preview period, Google saw good proof points on how cross-cloud transfer can be used to accelerate time to insight and deliver value to data teams.
Getting started with a cross-cloud architecture can be daunting, but cross-cloud transfer has been used to help customers jumpstart proof of concepts because it enables the migration of subsets of data without committing to a full migration. Kargo used cross-cloud transfer to accelerate a performance test of BigQuery. “We tested Cross-Cloud Transfer to assist with a proof of concept on BigQuery earlier this year. We found the usability and performance useful during the POC,” said Dinesh Anchan, Manager of Engineering at Kargo.
Other customers, like ActionIQ, see Cross-Cloud Transfer potential to complement differentiated offerings in their customer-centric platforms. “ActionIQ is a composable CDP technology that helps enterprises tap into customer data to deliver personalized customer experiences. We’re investing heavily in allowing customers to query data wherever it is, and we’re excited Cross Cloud Transfer is embracing that philosophy,” said Justin DeBrabant, Senior Vice President of Product at ActionIQ. “Democratizing data should not come at the cost of security, cost, or latency of data, and we’re excited to provide joint solutions with BigQuery that extend that philosophy.”
Google also saw this product being used to combine key datasets across clouds. A common challenge for customers is to manage cross-cloud billing data. CCT is being used to tie files together which have evolving schema on delivery for blob storage. “We liked the experience of using Cross-Cloud transfer to help consolidate our billing files across GCP, AWS, and Azure. CCT was a nice solution because we could use SQL statements to load our billing files into BigQuery,” said the engineering lead of a large research institution.