Next-gen search and RAG with Vertex AI

Generative AI has fundamentally transformed how the world interacts with information, and the search industry is no exception. The search landscape is changing rapidly, driven by the rise of large language models (LLMs). Whether they’re interacting with their company’s internal data or browsing public websites, users now increasingly expect more intelligent, intuitive, and comprehensive search experiences. Google Cloud have been steadily evolving our capabilities to help developers build their retrieval augmented generation (RAG) solutions and help meet them where they are in their journey.

Gen AI makes search better. LLMs excel at understanding the nuances of language and context-dependent meanings. Their ability to transform lengthy multimodal input (text, images) into numerical representations (embeddings) while preserving semantic relationships over large disparate data has opened up new possibilities for search. Embeddings enable more accurate interpretations of search queries as compared to traditional semantic- and keyword-based search systems. 

Search, in turn, makes LLMs better. To unlock the full potential of gen AI, organizations need to ground model response in what we call “enterprise truth” — fresh, real-time data and enterprise systems. The resulting RAG solution is powered by search, where the quality of the grounding is dependent on the quality of search that powers it.

When they leverage LLMs, developers can build more sophisticated gen AI applications that use grounded generation and that are powered by search solutions. LLM-powered summaries direct users to the most relevant search snippets for reference, boosting users’ productivity. 

In this blog post, Google Cloud delve into how the search landscape has shifted over the past year and how Vertex AI is poised to help organizations build search applications for their varied requirements. It consists of three parts: first we introduce emerging search patterns, then we explore the capabilities in Vertex AI Search to enable search applications. In the last section, Google discuss when to use what to bring various search applications to life with Vertex AI. 

Part 1: Emerging patterns in search

Working with a diverse range of customers across various sectors, we’ve identified three prominent search patterns:

1. Enhanced semantic search

Organizations across different domains including legal, finance, research, or human resources often have large amounts of unstructured documents, e.g., pdfs, docs, websites, internal Confluence sites, etc. With the advent of generative AI, they are now increasingly exploring RAG-powered solutions to find the right information faster and drive their employees’ productivity. For example, one common search scenario is in financial services, where RAG solutions are helping underwriters skim through extensive policy documents to find relevant information and clauses related to the claims. 

Semantic search understands the intent and context of a user query beyond keywords, to find the most relevant results. An example of semantic query is when a user searches for “Google’s Net Zero plans” or “Google’s Carbon Neutrality” against a repository containing Alphabet’s financial data. An LLM-powered semantic search solution understands the user’s query and the relationship between different relevant entities (“Net Zero” or “Carbon Neutrality plans”; and “Google” and “Alphabet”), how the two are related, and can provide relevant search results, often accompanied by a guided summary. In contrast, traditional keyword- or token-based solutions are likely to struggle to find relevant matches in the documents unless the documents directly refer to those keywords.

2. Hybrid search solutions

There are certain industries, specifically retail, where users often want to search for a particular product or feature via keyword or search by describing the product, for example, “LED” or “decorative small lights for my balcony”. 

For such cases, customers are exploring so-called hybrid search solutions that can deal with a combination of keyword and semantic search as a key functional requirement. Hybrid search allows users to find relevant entities while still benefiting from the flexibility of natural language queries, and caters to diverse user preferences and search intentions.

In the above example of Alphabet’s financial data, users can request for both keyword- or token-based search, e.g., “Carbon” or “Carbon negative” with auto-complete support, or also request more complex queries like “What are Google’s carbon negative plans”. The search system should be able to identify relevant results for user queries beyond the exact keyword match.

Depending on the requirements, customers may start building semantic search applications and if keyword- or token-based search is needed, can explore search solutions that support both keyword and semantic search. 

3. Analytical queries (beyond search): 

Semantic and hybrid search have been around for quite some time, but with the advent of LLMs the quality of search results and user experience has improved significantly. An emerging pattern is that boundaries between search and analytics have started to blur for users. Developers are exploring search solutions that allow users to ask complex, analytical questions of their data in natural language. For example, the query “Total Carbon Emissions removed for 2024 and 2022?” returns aggregated answers that go beyond simply finding documents that contain those words.

Developing analytical search with natural language is a very complex problem. Understanding disparate complex data sources and associated schema, especially in the case of structured relational data, and being able to produce aggregated results to consistently serve user queries requires a more complex “agentic” workflow. LLM-powered RAG can be one of the enabling components or tools for such agents. 

Part 2: Vertex AI: Powering the future of search

Vertex AI offers a comprehensive suite of tools and services to address the above search patterns.

Building and managing RAG systems can be complex and can be quite nuanced. Developers need to develop and maintain several RAG building blocks like data connectors, data processing, chunking, annotating, vectorization with embeddings, indexing, query rewriting, retrieval, reranking, along with LLM-powered summarization. Designing, building and maintaining this pipeline can be time- and resource-intensive. Being able to scale each of the components to handle bursty search traffic and coping with a large corpus of varied and frequently updated data can also be challenging. Speaking of scale, as the queries per second ramp up, many vector databases degrade both their recall and latency metrics.

Vertex AI Search leverages decades of Google’s expertise in information retrieval and brings together the power of deep information retrieval, state-of-the-art natural language processing, and the latest in LLM processing to understand user intent and return the most relevant results for the user. No matter where you are in the development journey, Vertex AI Search provides several options to bring your enterprise truth to life from out-of-the-box to DIY RAG. 

Why Vertex AI Search for out-of-the-box RAG: 

The out-of-the-box solution based on Vertex AI Search brings Google-quality search to building end-to-end, state-of-the-art semantic and hybrid search applications, with features such as: 

Explore this notebook (Part I) to see the Vertex AI Agent Builder SDK in action.

For greater customization 

The Vertex AI Search SDK further allows developers to integrate it with open-source LLMs or other custom components, tailoring the search pipeline to their specific needs. As mentioned above, building end-to-end RAG solutions can be complex; as such, developers might want to rely on Vertex AI Search as a grounding source for search results retrieval and ranking, and leverage custom LLMs for the guided summary. Vertex AI Search also provides grounding in Google Search. 

Find an example for Grounding Responses for Gemini mode Example notebook in Part II here.

Developers might already be building their LLM application with frameworks like Langchain/ LLamaIndex. Vertex AI Search has native integration with LangChain and other frameworks, allowing developers to retrieve search results and/or grounded generation. It can also be linked as an available tool in Vertex AI Gemini SDK. Likewise, Vertex AI Search can be a retrieval source for the new Grounded Generation API powered by Gemini “high-fidelity mode,” which is fine-tuned for grounded generation.

Here is a Notebook example for leveraging Vertex AI Search from LangChain in Part III here.

Vertex AI DIY Builder APIs for end-to-end RAG 

Vertex AI provides the essential building blocks for developers who want to construct their own end-to-end RAG solutions. These include APIs for document parsing, chunking, LLM text and multimodal vector embeddings, versatile vector database options (Vertex AI Vector Search, AlloyDB, BigQuery Vector DB), reranking APIs, and grounding checks.

It’s also worth noting that Gemini 1.5 Pro available on Vertex AI supports a 2M token input context window — around 2000 pages worth of context — all while maintaining SOTA reasoning capabilities. Recently Google Deepmind and University of Michigan published comprehensive research on choosing between RAG or long-context windows. Gemini 1.5 Pro adheres well to the given system instructions and contextual information to guide users with their queries. With the long context window, multimodal reasoning and its caching ability, developers can quickly start with Gemini 1.5 Pro to test and prototype their semantic information retrieval use case. Then, once they scale up their source data or usage profiles, they can use RAG, possibly in concert with a long context window.

Based on where developers are in their journey and their orchestration framework of choice, they can select Vertex AI Search out-of-the-box plus DIY APIs to build end-to-end RAG applications. Understanding your organization’s appetite towards building, maintaining and scaling RAG applications can also help guide a particular solution path. 

Let’s explore how the above options can be leveraged to build the different search scenarios we discussed.

Part 3: Search patterns with Vertex AI

Enabling semantic and hybrid search

Tackling analytical queries 

Building solutions to tackle analytical queries is also complex and challenging due to various factors like understanding complex data schemas and types, handling time ranges and relevant entities for filtering, and the frequent requirement for deterministic outputs. Agents using RAG and natural language to SQL could be a viable solution in this scenario. At Google Cloud, we’re constantly innovating in this space and provide components to help build analytical search-like systems. Check out the OpenDataQnA Analytic Agent on Github to see one such solution. The following Google Cloud capabilities can help developers build an analytical query agents workflow: 

Unlock your data with RAG, grounding and search in Vertex AI

To summarize, based on the search and business requirements and their development framework preference, developers can navigate through the various options that Vertex AI provides, from the out-of-the-box solution, to DIY, to meet them where they are in their RAG journey. For these use cases, Vertex AI offers:

Related posts

Running machine learning in the cloud for live service games

by Cloud Ace Indonesia
6 months ago

Updates coming for Authorized Networks and Cloud Run/Functions on GKE

by Kartika Triyanti
2 years ago

Infrastructure Security in Google Cloud

by Kartika Triyanti
2 years ago