Businesses across all industries are turning to AI for a clear view of their operations in real-time. Whether it’s a busy factory floor, a crowded retail space, or a bustling restaurant kitchen, the ability to monitor your work environment helps businesses be more proactive and ultimately, more efficient. 

Gemini 1.5 Pro’s multimodal and long context window capabilities can improve operational efficiency for businesses by automating tasks from inventory management to safety assessments. One powerful use case that’s emerged for developers is AI-powered kitchen analysis for busy restaurants. AI-powered kitchen analysis can benefit everyone – it can help a restaurant’s bottom line, and also train employees more efficiently while improving safety assessments that help create a safer work environment. 

In this post, We will show you how this works, and ways you can apply it to your business.

Understanding multimodal AI & long context window:

Before we step into the kitchen, let’s break down what “multimodal” and “long context window” mean in the world of AI: 

Multimodal AI can process and understand multiple types of data. Think of it as an AI system that can see, hear, read, and understand all at once. In our context, it can take the following forms:

  • Text: Recipes, orders, and inventory lists
  • Images: Food presentation and kitchen layouts
  • Audio: Kitchen commands and customer feedback
  • Video: Real-time cooking processes and staff movements

These data representations added together can reach GBs in size, which is where Gemini’s long context window comes into play. Long-context windows can consume millions of tokens (data points) at once. This makes it possible to input all the data mentioned above – from text to video – to generate cohesive outputs without losing any of your context. 

With a projected market size of over $13 billion by 2032 and a staggering CAGR of around 30% from 2024 to 2032, multimodal plus long context window capabilities are the secret ingredients for success.

Let’s look at a real world example

When it comes to running a restaurant, AI can step in as is your inventory manager and safety inspector all rolled into one. In the following test, we fed Gemini a five-minute video of a chef preparing meals during peak operating hours.

https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_e79MC4t.gif

We asked Gemini with a simple prompt to analyze the video and return multiple values that would help us analyze the meal preparation’s efficiency. First, we asked Gemini for the timestamps spent on each part of the process:

  1. Preparation
  2. Cooking
  3. Plating
  4. Serving
https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1stPrompt_optimized.gif

Prompt :

Watch the following video of food being prepared in a kitchen. For each food item being prepared, I want you to analyze the timestamps and provide the start and end times for each of these general cooking stages:

  1. Preparation: This includes any actions done before the food is cooked. Examples: Gathering ingredients, chopping vegetables, mixing sauces, preheating.
  2. Cooking: This involves applying heat to the food using any method. Examples: Frying, baking, grilling, microwaving. It also includes any actions done while the food cooks on the heat source, like flipping or stirring.
  3. Plating: This involves any actions taken after the food is cooked. Examples: Transferring food to a serving dish, adding garnishes, drizzling sauces
  4. Serving: when the cook hands the food to the customer

Output the data in chronological order as a JSON array with the following format: {“steps”: [{“step”: “Preparation”, “start”: “xx:xx”, “end”: “xx:xx”}, {“step”: “Cooking”, “start”: “xx:xx”, “end”: “xx:xx”}]}

Next, to find bottlenecks and optimize workflows we asked Gemini to identify the following key moments:

  • Positive moments 
  • Potential safety issues 
  • Inventory counts
  • Suggestions for improvement

Together, we put these values in a graph that broke down the efficiency of each task and identified opportunities for improvement. We also asked Gemini to translate this in several different languages for a diverse kitchen staff. 

The final result: Here’s how Gemini analyzed the kitchen

https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2ndPrompt_optimized.gif

Prompt :

You are a restaurant manager. Provide an analysis of this video; be verbose with your reasoning around the following areas:

-Inventory: count of inventory of different machines you see in the kitchen. Just output an integer as an approximation of what you see.

-Safety information: positive safety moments you see and potential safety hazards.

-Issues: Create a 2-value list of any issues / errors in the process of making food.

following the following json format:{“inventory”:[{“name”:”ingredient here”,”qty”:x}],”safety”:[{“moment”:”describe the moment here”,”type”:”positive/negative”}],”issue”:[{“issue”:”describe the issue here”}],”languages”:[{“english”:”english json”,”japanese”:”japanese json”,”spanish”:”spanish json”}]}

1. Real-time meal preparation and object tracking:

Gemini’s object detection capabilities identified ingredients and monitored cooking processes in real-time. By extracting the start and end timestamps for each meal preparation, you can precisely  measure meal prep times.  

2. Inventory management:

Say goodbye to the “Oops, we’re out of that” moment. By accurately tracking ingredient usage, Gemini helped prevent stock-outs and enabled proactive inventory replenishment. 

3. Safety assessments:

From detecting a slippery floor to noticing an unattended flame, Gemini picked up on those details that are easy to miss. It’s not about replacing human vigilance—it’s about enhancing it, creating a safer environment for both staff and diners.

4. Multilingual capabilities:

In a global culinary landscape, language barriers can be troublesome. Gemini broke down these barriers, ensuring that whether your chef speaks Mandarin or your server speaks Spanish, everyone’s on the same page. 

Gemini’s analysis of a five-minute video could help restaurants optimize operations, reduce costs, and enhance the customer experience. By automating and optimizing mundane tasks, staff can focus on what matters—creating culinary masterpieces and delivering exceptional service. It also helps businesses grow by improving cost savings – optimized inventory and resource management translate directly to a business’s financial bottom line. 

And, proactive hazard detection means fewer accidents and a safer work environment. It’s not just about avoiding lawsuits—it’s about creating a culture of care.

The future is served

Gemini’s models are pioneers in the market, unlocking use cases that are made possible with Google’s research and advancements. But Gemini’s impact extends far beyond the restaurant industry – its long context window allows businesses to analyze vast amounts of data, unlocking insights that were previously too costly to attain.