Data analytics in the age of AI: How we’ve enhanced our data platforms this year
AI is already having a profound impact on how organizations operate. The power of AI allows you to reimagine what you do, how you do it and who you do it for. For many companies it feels like they’re just one step away from using AI to start solving real business problems — they just need to activate their data.
Google Cloud has a robust portfolio of platforms and tools for storing, transforming, and gaining insights from your data, and that can be activated for AI. In this blog, we will summarize the key innovations to our Data and AI Cloud in 2023 across three strategic areas.
- Interconnecting all your data – structured and unstructured, in any format, across all locations.
- Bringing AI to your data – securely and quickly build AI models with all your data.
- Boosting productivity – helping all data teams to analyze data, generate code and optimize data workloads.
Register for the upcoming webcast on Nov 13th to get a snapshot of our plans and investments in BigQuery, Streaming Analytics, Data Lakes, Data Integration, and GenAI.
Interconnecting all your data
Data is spread over tens and sometimes hundreds of data silos. Your data workloads are increasing, with new formats, mostly unstructured, across clouds and on-premises systems. There are just too many tools to learn and move between. With all these challenges, AI projects eventually become data projects in disguise.
Google’s Data and AI Cloud lets you interconnect your data at multiple levels.
Interconnect structured and unstructured data – To unlock 360-degree insights into your business, you need to combine and analyze unstructured data, such as images, voice, and documents, with your structured data.
Google launched the general availability of BigLake Object Tables to help data users easily access, transverse, process and query unstructured data using SQL. We also launched support for the Hudi and Delta file format in BigLake, now generally available. Taking BigLake one step further, we launched the preview of fully managed Iceberg tables in BigLake, so you can use high-throughput streaming ingestion for your data in Cloud Storage, get a fully managed experience with automatic storage optimizations for your Lakehouse, and perform DML transactions using BigLake to enable consistent modifications and improved data security, all while retaining full compatibility with the Iceberg reader.
BigLake has seen hyper-growth, with a 27x increase in BigLake usage since the beginning of the year.
Interconnect data across clouds – Many customers manage and analyze their data on Google Cloud, AWS or Azure with BigQuery Omni, which provides a single pane of glass across clouds. Taking BigQuery Omni one step further, Google added support for cross-cloud materialized views, and cross-cloud joins. Google also extended analytics to on-prem data by bringing Dataproc Spark to Google Distributed Cloud. This allows you to run Spark on sensitive data in your data centers to support compliance or data sovereignty requirements and connect it with your BigQuery data in Google Cloud.
Interconnecting data management and governance – Google added intelligent data profiling and data quality capabilities to help you understand the completeness, accuracy and validity of your data. We also launched extended data management and governance capabilities in Dataplex. You get a single pane of glass for all your data and AI assets — including Vertex AI models and datasets, operational databases, and analytical data on Google Cloud and Omni.
Data sharing – In a given week, thousands of organizations share hundreds of petabytes of data across organizational boundaries using BigQuery. To further support interconnection of data, Google launched BigQuery data clean rooms to share and match datasets across companies and collaborate on analysis with trusted partners, all while respecting user privacy.
Cost optimization – Interconnecting all your data shouldn’t be expensive and unpredictable. So, Google introduced BigQuery pricing editions along with innovations for slots autoscaling and a new compressed storage billing model. BigQuery editions provide more choice and flexibility for you to select the right feature set for various workload requirements. You can mix and match among Standard, Enterprise, and Enterprise Plus editions to achieve the preferred price-performance by workload. BigQuery editions include the ability for single- or multi-year commitments at lower prices for predictable workloads, and new autoscaling capabilities that support unpredictable workloads by providing the option to pay only for the compute capacity you use.
Bringing AI to your data
AI provides numerous opportunities to activate your data. So, we made AI easily accessible to all your data teams and also made it easy to use your data to train AI models.
Customers already run hundreds of millions of predictions and training runs in BigQuery. In just the last six months, ML operations in BigQuery have grown more than 250% compared to last year.
Here are a few of the ways Google have enhanced BigQuery to support AI.
Access to foundational models – Google enabled users to access Vertex AI’s foundation models directly from BigQuery. With just a single statement you can connect a BigQuery table to a large language model (LLM) and tune prompts with your BigQuery data. This allows you to use generative AI capabilities such as text analysis on your data or generate new attributes to enrich your data model. With a few clicks, you can use the Vertex Doc AI workbench to deploy a personalized LLM extractor, which can then be directly accessed from BigQuery to extract specific knowledge from your text data.
Expand the range of AI models – Google also launched BigQuery ML inference engine, which allows you to access an ecosystem of pretrained models and open ML frameworks. Run predictions on Google vision, natural language and translation models in BigQuery, import models in additional formats like TensorFlow Lite, ONNX and XGBoost, and use models hosted in Vertex AI directly.
Features and vector embeddings – BigQuery is now the place to store all your ML features and vector embeddings with the launch of BigQuery feature tables and vector embeddings in preview. By loading feature and vector embedding data into BigQuery, you can build powerful semantic searches and do recommendation queries on the scale of your BigQuery data, in real-time. And you can manage your features the same way you manage your other data. Plus, we automatically synchronize the data into Vertex AI Feature Store to enable low-latency serving for your web applications without having to move any data.
Unified workspace for data and AI people – To bring AI and data together into one shared environment, we launched the preview of BigQuery Studio, which brings data engineering, analytics, and ML workloads together, so you can edit SQL, Python, Spark and other languages and easily run analytics, at petabyte scale and without additional infrastructure management overhead. BigQuery Studio gives you direct access to Colab Enterprise, a new offering that brings Google Cloud’s enterprise-level security and compliance to Colab.
Google also launched the preview of BigQuery DataFrames API, which provides a simple way to run Python for data science directly in BigQuery, using familiar APIs for Pandas or Scikit. With the ability to write Python in BigQuery, you get an awesome notebook experience.
Boosting productivity with AI
This year, Google put our decades of investment and research in AI into action to help you boost productivity.
AI for data analytics – Google launched Duet AI in BigQuery to simplify data analysis, generate code and optimize your data workloads. It can:
- Assist with writing SQL queries and Python code, allowing you to focus more on logic and outcomes
- Auto-suggest code in real time and generate full functions and code blocks
- Help you through your data work with a chat experience
We also brought Duet AI to our data migration service to help you modernize legacy applications through automatic SQL translations.
AI for data governance – Google also brought Duet AI to Dataplex. Duet AI in Dataplex can be used for metadata insights to solve the cold-start problem — how do I know which questions I can ask my data? Leveraging Duet AI, Google help you jumpstart your analytics with a generated list of questions that you can ask of your data, based on metadata and usage patterns, with one-click access to SQL queries that you can run in BigQuery Studio.
AI for business intelligence – More than 10 million users access Looker each month and they can gain deeper insights with access to more than 1,000 data sources and 800+ community connectors.
Further, Google launched Duet AI in Looker to help users do conversational data analysis in natural language. It allows you to:
- Do conversational data analysis in natural language
- Automatically create dashboards and reports by telling Looker the goal of your analysis
- Generate Google Slides presentations with intelligent summaries from your Looker dashboards
- Use natural language to quickly create calculations and visuals with our Duet Formula and Visual assistants
- And rapidly create LookML code and specify the intent of your data model in natural language