Build your own generative AI chatbot directly from BigQuery

Organizations today have access to a wealth of data, including customer information, financial records, operational logs, and more, which they aim to leverage for building new generative AI solutions. However, they face a number of challenges: 

  1. Building and training LLMs requires significant technical expertise and computational resources. Teams usually need to use many different frameworks and application stacks to manage their environments. 
  2. There are challenges around data governance and data quality. Data often coming from non-managed or non-curated datastores makes it difficult to productionize these applications. At the end of the day, LLMs are only as good as the data they are trained on. 
  3. Understanding how LLMs arrive at their answers can be challenging, which can raise concerns about a solution’s trust and accountability. When you understand the data that you feed into the model, you have better assurances and better understanding of the model’s output. Given this, imagine doing this through your enterprise data lake and data warehouse — data that you have a good understanding of. Therefore, applying similar rules in your AI application helps in terms of managing the data quality. 
  4. AI sometimes struggles with the nuances of human language, for example humor or idiomatic expressions, which can lead to misunderstandings. For a global enterprise, any internal chatbot needs to be able to handle multiple languages and cultural contexts, which can be complex to implement.

How BigQuery and Gemini models can help

In this article, Google will demonstrate how to build a sample application called DataSageGen, an innovative chatbot designed to be a personal guide to access and process information from a vast array of sources, including:

Simply ask DataSageGen a question, and it will intelligently search and retrieve relevant information, providing you with concise and understandable answers.

But DataSageGen is more than just a search engine. It leverages advanced techniques like retrieval augmented generation (RAG) and BigQuery ML to understand the context of your query and deliver the most relevant and insightful responses. 

Here’s how it works in the nutshell:

Solutions like these offer valuable insights for anyone considering building a chatbot to serve a technical knowledge base. Therefore, Google are providing the detailed steps here for you so that you can leverage these capabilities to build your own knowledge base, without your enterprise data leaving your enterprise data warehouse (e.g. BigQuery) or your unstructured data (in Cloud Storage). 

Building the DataSageGen data architecture

Step 1: Collecting user input

The flow initiates with capturing the user’s input through the DataSageGen chatbot interface. The user’s query or command, referred to as the “User Prompt,” is extracted from the request payload using Flask’s request handling. This step is crucial, as it determines the input for the entire processing pipeline.

Class/Method explanation:

Intelligent contextual search and retrieval:

Step 2: Intelligent contextual search and retrieval 

In this step, Google use BigQuery ML to predict top results based on data trends, plus advanced text analysis (RAG) to tap into release notes, whitepapers, community knowledge, and announcements. To pinpoint the best answers, we then convert the questions into semantic “embeddings”, capturing their deeper meaning. This helps ensure your technical data analytics queries get the most relevant, up-to-date information. Here is a more detailed description of Step 2:

This step involves generating a semantic representation of the user’s query using the `generate_text_embeddings` function. The function transforms the textual input into a dense vector (embedding), capturing the semantic nuances of the input. This vector representation is then used for contextual search and retrieval operations.

query_embeddings = generate_text_embeddings(question)
def generate_text_embeddings(sentences):
   """Generate text embeddings for given sentences."""
   model = TextEmbeddingModel.from_pretrained("textembedding-gecko@001")
   embeddings = model.get_embeddings([sentences])   
   vectors = [embedding.values for embedding in embeddings]
   return vectors

Class/Method explanation:

WITH predictions AS (
  SELECT
    *,
    ML.PREDICT(MODEL `project.dataset.your_model`, (
      SELECT
        *
      FROM
        `project.dataset.new_data`
    )) AS predicted_label
  FROM
    `project.dataset.new_data`
)
SELECT
  *
FROM
  predictions
ORDER BY
  predicted_label DESC
LIMIT 20; — Top 20 results

Step 3: Prompt augmentation 

The prompt passed to DataSageGen chatbot by the user is augmented by the retrieval of RAG. In this step we fine-tune responses received for a better user experience. We take your question and add instructions on tone, plus augment it with a list of off-limit topics (such as hate speech, violence, or sensitive personal information). This helps ensure the chatbot’s answers are helpful and stay on track.

banned_phrases = os.environ.get('BANNED_QUESTIONS', '')  
# Instructions for friendly tone and to avoid banned phrases
instructions = ("Please provide a friendly response. The following topics are out of context: "
               + ", ".join(banned_phrases) + ".”)

The user prompt is augmented with structured instructions and a list of banned phrases to guide the chatbot’s response generation. This augmentation involves appending additional context that instructs the model on how to format its responses and topics to avoid, ensuring the output is aligned with user expectations and content guidelines.

Class/Method explanation:

Step 4: Model inference 

The augmented prompt is passed as input to the Gemini Pro model in Vertex AI for inference and tuned answer retrieval. In this step we get to the heart of your query! We add guidance to your question, then tap into the Gemini Pro model on Vertex AI. To enrich its answers, we search for a vector index for the most relevant background information. This process means you get more precise, informative responses to complex data analytics questions.

aiplatform.init(project=PROJECT_ID, location=LOCATION)
vertexai.init()
model = GenerativeModel("gemini-pro")
bqrelease_index_ep = aiplatform.MatchingEngineIndexEndpoint(index_endpoint_name=INDEX_ENDPOINT_NAME)
response = bqrelease_index_ep.find_neighbors(
   deployed_index_id="bqrelease_index",
   queries=[qry_emb[0]],
   num_neighbors=10
)
matching_ids = [neighbor.id for sublist in response for neighbor in sublist]
context = generate_context(matching_ids, data)
original_prompt = f"Based on the context delimited in backticks, answer the query, ```{context}``` {question}"
# Combine the instructions with the original prompt
full_prompt = f"{instructions} {original_prompt}"

This phase utilizes the augmented prompt as input to the Gemini Pro model hosted on Vertex AI for inference. Additionally, it involves querying Vertex AI vector search index for contextually relevant documents based on the query embeddings. This enriched context, combined with the model’s inference capabilities, allows for generating nuanced and informed responses.

Class/Method explanation:

Step 5: Response generation 

After processing the information by the Gemini Pro model in Vertex AI, the DataSageGen chatbot generates a response that is delivered back to the user.

model = GenerativeModel("gemini-pro")
return jsonify({'response': chat_response.text})

In the final step, the Gemini Pro model processes the augmented prompt, including the contextual information retrieved from the Matching Engine index, to generate a tailored response. This response is then formatted and delivered back to the user, completing the interaction loop.

Class/Method explanation:

Making DataSageGen an application

In the final step, we transform the RAG into a robust web application. Identity-Aware Proxy (IAP) secures access, ensuring only authorized users can interact with the chatbot. HTTPS Cloud Load Balancing ensures optimal performance and reliability by distributing traffic efficiently, especially during peak usage or maintenance windows. And Cloud Run hosts the chatbot, automatically scaling resources to meet demand while optimizing costs. change, load balancer allows managing such unforeseen challenges.

This image outlines the application infrastructure architecture of the DataSageGen chatbot system, focusing on security and scalability. The various components are:

1. User Access: Users start by accessing the DataSageGen chatbot, through a web interface.

2. Identity Aware Proxy (IAP): As users attempt to access the DataSageGen chatbot, the IAP acts as a gatekeeper.

HTTPS Cloud Load Balancing: Once past the IAP, the architecture employs HTTPS Cloud Load Balancing to manage incoming traffic. This provides global scalability and fast, 6 sec failover, meaning it can handle a high volume of requests and quickly reroute traffic if a part of the system fails.

3. Cloud Run: This is the final step, where the DataSageGen chatbot application is hosted. Cloud Run allows for application scalability and manages costs effectively, automatically scaling based on incoming requests and charging only for the compute resources used during request processing.

Overall, the DataSageGen chatbot application architecture emphasizes secure access control with IAP, robust traffic management with HTTPS Cloud Load Balancing, and efficient resource use and scalability with Cloud Run.

Conclusion

Knowledge-based chatbots are different. They don’t just spit out pre-programmed responses. Instead, they use large language models (LLMs) to analyze your entire knowledge base – FAQs, help articles, product catalogs – and generate natural, human-like responses that are contextually relevant. The benefits are immense:

But remember, to take advantage of generative AI to build a next-generation chatbot, you need to:

With these considerations, knowledge-based chatbots can revolutionize customer support, offering enhanced experiences, increased efficiency, and a future-proof solution for your business. 

The field of generative AI is rapidly evolving, requiring ongoing adaptation and refinement of chatbot solutions.The DataSageGen architecture helps ensure that you get the most accurate and relevant information possible, saving you time and effort, while also being built with security and scalability in mind. It uses Identity Aware Proxy (IAP) to control access, HTTPS Cloud Load Balancing for efficient traffic management, and Cloud Run for cost-effective scalability. Whether you’re a data engineer, product manager, or simply curious about data and AI, DataSageGen is an invaluable tool for anyone looking to deepen their understanding and navigate this complex field with ease.

To learn more about BigQuery’s new RAG and vector search features, check out the documentation. Use this tutorial to apply Google’s best-in-class AI models to your data, deploy models and operationalize ML workflows without moving data from BigQuery. Check out this github repository to see how you can deploy such an application with your own corpus. You can also watch a demonstration on how to build an end-to-end data analytics and AI application directly from BigQuery while harnessing the potential of advanced models like Gemini.

Related posts

Best practices for SAP performance benchmarking

by Cloud Ace Indonesia
5 months ago

What is Microservices Architecture?

by Cloud Ace Indonesia
3 years ago

Want your cloud to be more secure? Stop using service account keys

by Cloud Ace Indonesia
8 months ago