Migrating data to the cloud can be a daunting task. Especially moving data from warehouses and legacy environments requires a systematic approach. These migrations usually need manual effort and can be error-prone. They are complex and involve several steps such as planning, system setup, query translation, schema analysis, data movement, validation, and performance optimization. To mitigate the risks, migrations necessitate a structured approach with a set of consistent tools to help make the outcomes more predictable.
Google Cloud simplifies this with the BigQuery Migration Service – a suite of managed tools that allow users to reliably plan and execute migrations, making outcomes more predictable. It is free to use and generates consistent results with a high degree of accuracy.
Major brands like PayPal, HSBC, Vodafone and Major League Baseball use BigQuery Migration Service to accelerate time to unlock the power of BigQuery, deploy new use cases, break down data silos, and harness the full potential of their data. It’s incredibly easy to use, open and customizable. So, customers can migrate on their own or choose from our wide range of specialized migration partners.
BigQuery Migration Service automates most of the migration journey for you. It divides the end-to-end migration journey into four components: assessment, SQL translation, data transfer, and validation. Users can accelerate migrations through each of these phases often just with the push of a few buttons. In this blog, we’ll dive deeper into each of these phases and learn how to reduce the risk and costs of your data warehouse migrations.
Step 1: Assessment
BigQuery Migration Service generates a detailed plan with a view of dependencies, risks, and the optimized migrated state on BigQuery by profiling the source workload logs and metadata.
During the assessment phase, BigQuery Migration Service guides you through a set of steps using an intuitive interface and automatically generates a Google Data Studio report with rich insights and actionable steps. Assessment capabilities are currently available for Teradata and Redshift, and will soon be expanded for additional sources.
Step 2: SQL Translation
This phase is often the most difficult part of any migration. BigQuery Migration Service provides fast, semantically correct, human readable translations from most SQL flavors to BigQuery. It can intelligently translate SQL statements in high-throughput batch and Google-translate-like interactive modes from Amazon Redshift SQL, Apache HiveQL, Apache Spark SQL, Azure Synapse T-SQL, IBM Netezza SQL/NZPLSQL, MySQL, Oracle SQL/PL/SQL/Exadata, Presto SQL, PostgreSQL, Snowflake SQL, SQL Server T-SQL, Teradata SQL/SPL/BTEQ and Vertica SQL.
Unlike most existing offerings which parse Regular Expressions, BigQuery’s SQL translation is true compiler based, with advanced customizable capabilities to handle macro substitutions, user defined functions, output name mapping and other source-context-aware nuances. The output is detailed and prescriptive with clear “next-actions”. Data engineers and data analysts save countless hours leveraging our industry leading automated SQL translation service.
Step 3: Data Transfer
BigQuery offers data transfer service from source systems into BigQuery using a simple guided wizard. Users create a transfer configuration and choose a data source from the drop down list.
Destination settings walk the user through connection options to the data sources and securely connect to the source and target systems.
A critical feature of BigQuery’s data transfer is the ability to schedule jobs. Large data transfers can impose additional burdens on operational systems and impact the data sources. BigQuery Migration Service provides the flexibility to schedule transfer jobs to execute at user-specified times to avoid any adverse impact on production environments
Step 4: Validation
This phase ensures that data at the legacy source and BigQuery are consistent after the migration is completed. Validation allows highly configurable, and orchestrate-able rules to perform a granular per-row, per-column, or per-table left-to-right comparison between the source system and BigQuery. Labeling, aggregating, group-by, and filtering enable deep validations.