Meet Google’s unified data and AI offering
Without AI, you’re not getting the most out of your data.
Without data, you risk stale, out-of-date, suboptimal models.
But most companies are still struggling with how to keep these highly interdependent technologies in sync and operationalize AI to take meaningful action from data.
Google have learned from their years of experience in AI development how to make data-to-AI workflows as cohesive as possible and as a result their data cloud is the most complete and unified data and AI solution provider in the market. By bridging data and AI, data analysts can take advantage of user-friendly, accessible ML tools, and data scientists can get the most out of their organization’s data. All of this comes together with built-in MLOps to ensure all AI work — across teams — is ready for production use.
In this blog we’ll show you how all of this works, including exciting announcements from the Data Cloud Summit:
- Vertex AI Workbench is now GA bringing together Google Cloud’s data and ML systems into a single interface so that teams have a common toolset across data analytics, data science, and machine learning. With native integrations across BigQuery, Spark, Dataproc, and Dataplex data scientists can build, train and deploy ML models 5X faster than traditional notebooks.
- Introducing Vertex AI Model Registry, a central repository to manage and govern the lifecycle of your ML models. Designed to work with any type of model and deployment target, including BigQuery ML, Vertex AI Model Registry makes it easy to manage and deploy models.
Use ML to get the most out of your data, no matter the format
Analyzing structured data in a data warehouse, like using SQL in BigQuery, is the bread and butter for many data analysts. Once you have data in a database, you can see trends, generate reports, and get a better sense of your business. Unfortunately, a lot of useful business data isn’t in the tidy tabular format of rows and columns. It’s often spread out over multiple locations and in different formats, frequently as so-called “unstructured data” — images, videos, audio transcripts, PDFs — can be cumbersome and difficult to work with.
Here, AI can help. ML models can be used to transcribe audio and videos, analyze language, and extract text from images—that is, to translate elements of unstructured data into a form that can be stored and queried in a database like BigQuery. Google Cloud’s Document AI platform, for example, uses ML to understand documents like forms and contracts. Below, you can see how this platform is able to intelligently extract structured text data from an unstructured document like a resume. Once this data is extracted, it can be stored in a data warehouse like BigQuery.
Bring machine learning to data analysts via familiar tools
Today, one of the biggest barriers to ML is that the tools and frameworks needed to do ML are new and unfamiliar. But this doesn’t have to be the case. BigQuery ML, for example, allows you to train sophisticated ML models at scale using SQL code, directly from within BigQuery. Bringing ML to your data warehouse alleviates the complexities of setting up additional infrastructure and writing model code. Anyone who can write SQL code can train a ML model quickly and easily.
Easily access data with a unified notebook interface
One of the most popular ML interfaces today are notebooks: interactive environments that allow you to write code, visualize and pre-process data, train models, and a whole lot more. Data scientists often spend most of their day building models within notebook environments. It’s crucial, then, that notebook environments have access to all of the data that makes your organization run, including tools that make that data easy to work with.
Vertex AI Workbench, now generally available, is the single development environment for the entire data science workflow. Integrations across Google Cloud’s data portfolio allow you to natively analyze your data without switching between services:
- Cloud Storage: access unstructured data
- BigQuery: access data with SQL, take advantage of models trained with BigQuery ML
- Dataproc: execute your notebook using your Dataproc cluster for control
- Spark: transform and prepare data with autoscaling serverless Spark
Below, you’ll see how you can easily run a SQL query on BigQuery data with Vertex AI Workbench.
But what happens after you’ve trained the model? How can both data analysts and data scientists make sure their models can be utilized by application developers and maintained over time?
Go from prototyping to production with MLOps
While training accurate models is important, getting those models to be scalable, resilient, and accurate in production is its own art, known as MLOps. MLOps allow you to:
- Know what data your models are trained on
- Monitor models in production
- Make training process repeatable
- Serve and scale model predictions
- A whole lot more! (See the “Practitioners Guide to MLOps” whitepaper for a full and detailed overview of MLOps)
Built-in MLOps tools within Vertex AI’s unified platform remove the complexity of model maintenance. Practical tools can help with everything from training and hosting ML models, managing model metadata, governance, model monitoring, and running pipelines – all critical aspects of running ML in production and at scale.
And now, Google are extending their capabilities to make MLOps accessible to anyone working with ML in your organization.
Easy handoff to MLOps with Vertex AI Model Registry
Today, Google are announcing Vertex AI Model Registry, a central repository that allows you to register, organize, track and version trained ML models and is designed to work with any type of model and deployment target, whether that’s through BigQuery, Vertex AI, AutoML, custom deployments on GCP or even out of the cloud.
Vertex AI Model Registry is particularly beneficial for BigQuery ML. While BigQuery ML brings the powerful scalability of BigQuery for batch predictions, using a data warehouse engine for real-time predictions just isn’t practical. Furthermore, you might start to wonder how to orchestrate your ML workflows based in BigQuery. You can now discover and manage BigQuery ML models and easily deploy those models to Vertex AI for real-time predictions and MLOps tools.
End-to-End MLOps with pipelines
One of the most popular approaches to MLOps is the concept of ML pipelines: where each distinct step in your ML workflow from data preparation to model training and deployment are automated for sharing and reliably reproducing.
Vertex AI Pipelines is a serverless tool for orchestrating ML tasks using pre-built components or your own custom code. Now, you can easily process data and train models with BigQuery, BigQuery ML, and Dataproc directly within a pipeline. With this capability, you can combine familiar ML development within BigQuery and Dataproc into reproducible, resilient pipelines and orchestrate your ML workflows faster than ever.
See an example of how this works with the new BigQuery and BigQuery ML components.
Learn more and get started
Google are excited to share more about their unified data and AI offering today at the Data Cloud Summit. Please join for the spotlight session on the “AI/ML strategy and product roadmap” or the “AI/ML notebooks ‘how to’ session.”
And if you’re ready to get hands on with Vertex AI, check out these resources:
- Codelab: Training an AutoML model in Vertex AI
- Codelab: Intro to Vertex AI Workbench
- Video Series: AI Simplified: Vertex AI
- GitHub: Example Notebooks
- Training: Vertex AI: Qwik Start