Sentiment analysis is a powerful tool that uses natural language processing (NLP) to uncover the underlying emotions (positive, negative, neutral) within text such as customer reviews. This analysis can offer valuable insight into how customers perceive your products, services, and brand overall. Furthermore, by utilizing techniques like topic modeling and keyword extraction, you can identify recurring themes. These themes could highlight specific product features, aspects of customer service, or general pain points — providing a roadmap for improvement and a way to address customer needs more effectively.

In BigQuery you can use ML.GENERATE_TEXT function that lets you directly utilize powerful large language models (LLMs) from Google’s Vertex AI within your SQL queries to analyze text in a BigQuery table. This means you can perform sophisticated text generation and analysis tasks on data stored in your BigQuery tables without needing to move data or write complex code outside the BigQuery environment. ML.GENERATE_TEXT function can also be used to generate text that describes visual content using a remote model based on a gemini-pro-vision multimodal model. Some key benefits are:

  1. Ease of use: BigQuery’s SQL integration lets you tap into advanced language model capabilities without requiring separate machine learning pipelines or specialized coding expertise.
  2. Scalability: BigQuery’s strengths in handling massive datasets pair nicely with LLMs to process customer reviews or other text sources at scale.
  3. Insight generation: ML.GENERATE_TEXT assists in tasks such as:
  • Sentiment analysis: Determine the overall emotional tone (positive, negative, neutral) in customer feedback.
  • Theme extraction: Identify common topics and trends within reviews
  • Summarization: Condense lengthy reviews into key points.
  • Text completion & generation: Get help with responses, ad copy, or creative writing based on existing reviews.
https://storage.googleapis.com/gweb-cloudblog-publish/images/for_blog_1.max-1900x1900.png

ML.GENERATE_TEXT in action

Now, let’s take a look at how to use ML.GENERATE_TEXT, taking a hypothetical rideshare company as an example.

Setup instructions:

  1. Before starting, choose your GCP project, link a billing account, and enable the necessary API; full instructions here.
  2. Create a cloud resource connection and get the connection’s service account; full guide here.
  3. Grant access to the service account by following the steps here
  4. Load data. To oad from public storage account, using the following command:
    1. Please replace ‘[PROJECT_ID.DATASET_ID]’  with your project_id, and enter a name for your dataset
    2. The command will create a table named ‘customer_review’ in your dataset
CREATE SCHEMA IF NOT EXISTS `[PROJECT_ID.DATASET_ID]` OPTIONS (location='us');
LOAD DATA OVERWRITE `[PROJECT_ID.DATASET_ID].customer_review`
FROM FILES ( format = 'PARQUET',
uris = ['gs://data-analytics-golden-demo/rideshare-lakehouse-raw-bucket/rideshare_llm_export/v1/raw_zone/customer_review/000000000000.parquet']);

Sentiment analysis

Let’s walk through an example of performing sentiment analysis.

1. Create a model

Create a remote model in BigQuery that utilizes a Vertex AI foundation model.

Syntax:

CREATE OR REPLACE MODEL
`PROJECT_ID.DATASET_ID.MODEL_NAME`
REMOTE WITH CONNECTION `PROJECT_ID.REGION.CONNECTION_ID`
OPTIONS (ENDPOINT = 'ENDPOINT');

Code example:

  • Please replace '[PROJECT_ID.DATASET_ID.MODEL_NAME]'  with your project_id, dataset_id and model name
  • Please replace '[PROJECT_ID.REGION.CONNECTION_ID]'  with your project_id, region and connection_id
CREATE OR REPLACE MODEL `[PROJECT_ID.DATASET_ID.MODEL_NAME]`
REMOTE WITH CONNECTION `[PROJECT_ID.REGION.CONNECTION_ID]`
OPTIONS (ENDPOINT = 'gemini-pro');

2. Generate text

With just a few lines of SQL, you can analyze text or visual content in your BigQuery table using that model and the ML.GENERATE_TEXT function.

ML.GENERATE_TEXT syntax differs depending on the Vertex AI model that your remote model targets. Read the documentation to understand all the parameters of the ML.GENERATE_TEXT function.

Syntax:

[max_output_tokens AS max_output_tokens]

[, top_k AS top_k] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences]) )” style=”-webkit-tap-highlight-color: transparent; margin: 0px; padding: 20px 24px; box-sizing: border-box; background: rgb(248, 249, 250); border: 1px solid rgb(241, 243, 244); border-radius: 16px; position: relative;”>

ML.GENERATE_TEXT(
MODEL project_id.dataset.model,
{ TABLE project_id.dataset.table | (query_statement) },
STRUCT(
  [max_output_tokens AS max_output_tokens]
  [, top_k AS top_k]
  [, top_p AS top_p]
  [, temperature AS temperature]
  [, flatten_json_output AS flatten_json_output]
  [, stop_sequences AS stop_sequences])
)

Code example:

  • Please replace '[PROJECT_ID.DATASET_ID]'  with your project_id and dataset_id
  • Please replace '[PROJECT_ID.REGION.CONNECTION_ID]'  with your project_id, region and connection_id
CREATE OR REPLACE TABLE `[PROJECT_ID.DATASET_ID].review_sentiment_analysis` AS
WITH PROMPT AS (
 SELECT CONCAT ('For the given review classify the sentiment as Positive, Neutral or Negative.',
'\n input: The driver was able to make some small talk, but he didn\'t go overboard. I liked that he was friendly and chatty, but he also knew when to leave me alone. The trunk fit my belongings, and the car was clean and comfortable. Overall, it was a good ride.',
'\n output: \n Positive - Trunk fit my belongings, friendly, chatty',
'\n input: I took a rideshare last night and it was an okay experience. The car was adequately clean, but it was a bit warm for my liking. The driver was able to make some small talk, but I wasn\'t really in the mood to talk. Overall, it was a fine ride.',
'\n output: Neutral - Clean, A bit warm, fun ride',
 '\n input: ', customer_review_text,
'\n output: '
) AS prompt, customer_id, 
customer_review_text
 FROM `[PROJECT_ID.DATASET_ID].customer_review`
 LIMIT 20
),
REVIEW_RESPONSE_GENERATION AS (
 SELECT *
 FROM
 ML.GENERATE_TEXT(
    MODEL `[PROJECT_ID.DATASET_ID.MODEL_NAME]`,
   (SELECT * FROM PROMPT),
   STRUCT(
     200 AS max_output_tokens,
     0.5 AS temperature,
     40 AS top_k,
     1.0 AS top_p,
     TRUE AS flatten_json_output))
)
SELECT ml_generate_text_llm_result, customer_id, customer_review_text, prompt, ml_generate_text_status FROM REVIEW_RESPONSE_GENERATION;
SELECT ml_generate_text_llm_result, customer_id, customer_review_text, prompt, ml_generate_text_status  FROM `[PROJECT_ID.DATASET_ID].review_sentiment_analysis`;

3. Result:

In the prompt, Google provided the model with context and two examples, clearly demonstrating our desired output format. You can validate that the outputs generated are aligned to the examples we provided to the model as part of a few-shot prompting approach.

In few-shot prompting, including a few examples of reviews with their corresponding sentiment labels is crucial for guiding the model’s behavior. To help ensure the model’s effectiveness in various situations, it’s essential to offer a sufficient number of well-structured examples covering diverse review scenarios.

Then, by performing sentiment analysis on the customer reviews, we can have insights into their preferences and pain points regarding our products. By identifying key themes in the reviews, we can effectively communicate valuable feedback to the product team, enabling them to make informed, data-driven decisions and improvements.

https://storage.googleapis.com/gweb-cloudblog-publish/images/for_blog_2.max-1700x1700.png

In the table above, you can see the results of ML.GENERATE_TEXT including the input table along with the following columns:

  1. ml_generate_text_result: This is the JSON response and the generated text is in the text element.
  2. ml_generate_text_llm_result: a STRING value that contains the generated text. This column is returned when flatten_json_output is TRUE.
  3. ml_generate_text_rai_result: a STRING value that contains the safety attributes. This column is returned when flatten_json_output is TRUE.
  4. ml_generate_text_status: a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful.

Extracting themes

Now, let’s extract the theme from the reviews using the model we created above:

  • Please replace '[PROJECT_ID.DATASET_ID]'  with your project_id and dataset_id
CREATE OR REPLACE TABLE `[PROJECT_ID.DATASET_ID].extract_themes` AS
WITH PROMPT AS (
 SELECT CONCAT(
"""
Classify the text as one or more of the following categories and return in the below json format.
- "trunk space small"
- "trunk space large"
- "driving too fast"
- "driving too slow"
- "clean car"
- "dirty car"
- "car too hot"
- "car too cold"
- "driver likes conversation"
- "driver likes no conversation"
- "driver likes music"
- "driver likes no music"
- "distracted driver"
JSON format: [ "value" ]
Sample JSON Response: [ "dirty car", "car too cold" ]
Text:
""", customer_review_text) AS prompt, customer_id, customer_review_text
         FROM `[PROJECT_ID.DATASET_ID].customer_review`
         LIMIT 10
),
EXTRACT_THEMES AS (
 SELECT *
 FROM
 ML.GENERATE_TEXT(
    MODEL `[PROJECT_ID.DATASET_ID.MODEL_NAME]`,
   (SELECT * FROM PROMPT),
   STRUCT(
     1024 AS max_output_tokens,
     0 AS temperature,
     1 AS top_k,
     0 AS top_p,
     TRUE AS flatten_json_output))
)
SELECT ml_generate_text_llm_result, customer_id, customer_review_text, prompt, ml_generate_text_status FROM EXTRACT_THEMES;
SELECT ml_generate_text_llm_result, customer_id, customer_review_text, prompt, ml_generate_text_status  FROM `[PROJECT_ID.DATASET_ID].extract_themes`;

Result:

Using ML.GENERATE_TEXT function and SQL from the BigQuery console, we can efficiently identify key themes within our customer reviews. This gives us deeper insight into customer perceptions and can provide actionable data to improve our products.

https://storage.googleapis.com/gweb-cloudblog-publish/images/for_blog_3.max-1700x1700.png

ML.GENERATE_TEXT is tightly integrated with the Gemini model, which is designed for higher input/output scale and better result quality across a wide range of tasks, like text processing including classification and summarization, sentiment analysis, and code generation. 

Analyzing the themes

Now that Google identified the themes in our reviews, let’s dive deeper with data canvas in BigQuery, an AI-centric experience to reimagine data analytics that Google introduced at Next ‘24. BigQuery data canvas lets you discover, transform, query, and visualize data using natural language. It also provides a graphical interface that lets you work with data sources, queries, and visualizations in a directed acyclic graph (DAG), giving you a view of your analysis workflow that maps to your mental model.

Given that our themes are stored in the  ‘extract_themes’ table, let’s create a data canvas to analyze them further. Click the down arrow next to the ‘+’ icon and choose ‘Create Data canvas’

https://storage.googleapis.com/gweb-cloudblog-publish/images/for_blog_4.max-1100x1100.png

You will be brought to a screen where you can search for the ‘extract themes’ table and get started.

https://storage.googleapis.com/gweb-cloudblog-publish/images/for_blog_5.max-1300x1300.png

Once you’ve selected a table, you’ll see it on a canvas where you can query it directly or join it with other tables.

To create a bar chart of themes, click on the ‘Query’ button and type ‘bar chart for the most common theme and remove the null values and limit the result to top 10 values’. The AI understands your request and automatically generates the correct query, even though the ‘themes’ aren’t in a dedicated column — the AI recognizes that the themes are found within the ‘ml_genertae_text_llm_result’ column. Finally, click ‘Run’ to see the query result.

https://storage.googleapis.com/gweb-cloudblog-publish/images/for_blog_6.max-2000x2000.png
https://storage.googleapis.com/gweb-cloudblog-publish/images/for_blog_7.max-1700x1700.png

Your theme data is ready! Click ‘Visualize’ to instantly see your bar chart. 

https://storage.googleapis.com/gweb-cloudblog-publish/images/for_blog_8.max-1700x1700.png

You now have a bar chart of the extracted themes from your customer reviews, plus automatically generated helpful insights based on the data and explanations of your findings. 

In short, BigQuery data canvas lets you analyze your data from start to finish with simple natural language commands: Discover relevant data, merge it with customer information, find key insights, collaborate with your team members, and create reports — all in one place. Plus, you can save these results or combine them with other data for further analysis or extract it into a notebook.