Do you ever wish you could sit down and talk physics with Albert Einstein or playwriting with William Shakespeare? With Character.AI, the possibilities are as infinite as the imagination.

At Character.AI, Google are on a mission to deliver lifelike interactions using artificial intelligence (AI). Our service offers a groundbreaking platform where users can engage in lifelike conversations with their favorite Characters. These can be inspired by historical figures, like Abraham Lincoln, helpers, like Stella, a sassy personal assistant, or fictional personas, like NYC Cat (or Librarian Linda). These are just a few of the millions of Characters that can be found on the site. They can also immerse themselves in dynamic Group Chats, where multiple Characters interact seamlessly with multiple users and one another, creating an interactive dialogue. Character.AI makes this possible by harnessing the power of neural language models to analyze extensive text data and generate intelligent responses based on that knowledge.

Soaring growth requires a scalable solution

Google are a highly technical business end-to-end, from training our distinctive models on supercomputers to delivering our service as a web-based chatbot. This, along with the growing momentum of the generative AI market, requires the ability to scale and modernize without missing a beat.

Our database layer stores critical data essential to the platform’s functionality. Since Google released our first beta model in September 2022, Character.AI’s popularity has soared, causing the database load to grow exponentially. Consequently, we needed to scale our operations — fast. The database we had initially been using was restricted by the maximum scaling capacity of each instance and the number of smaller instances we could patch together to effectively distribute the workload. We didn’t have the resources to transform or refactor the database to a more scalable engine within our given timeframe, and we needed a solution that could offer immediate scale and performance benefits without requiring extensive code changes.

Figure 1: Previous architecture

Enhancing data performance with Google Cloud and AlloyDB

As an AI company, efficiently processing large amounts of data is paramount for our time-to-market and ability to build differentiated algorithms. Therefore, Google were initially drawn to Google Cloud’s distinctive tensor processor units (TPUs) and graphic processor units (GPUs) like NVIDIA’s L4 GPUs. Then, as we prototyped our service and started building our consumer application, Google Cloud’s managed database solutions became critical in helping us scale our applications with a small team.

When Google found AlloyDB for PostgreSQL, Google were stuck between a rock and a hard place. Usage of our service had scaled exponentially, putting unique stresses to various parts of our infrastructure, especially our databases. Initially, Google were able to solve the increased demand by scaling up to larger machines, but over the course of a few weeks, Google found that even the largest machines weren’t able to service our customer demand reliably, and we were running out of headroom. With the time pressure we had, Google needed to find a solution that could be deployed in days. Major refactoring to a sharded architecture or rebasing on top of a proprietary database engine, for example, was out of the question. AlloyDB promised better performance and higher scalability with a fully PostgreSQL compatible interface.

Achieving 150% growth with AlloyDB’s increased scalability

To facilitate the migration process, we opted for a replication strategy. Google ran two replication sets from the source database to the destination AlloyDB database, operating in change data capture (CDC) mode for 10 days. This allowed us to prepare our environment for the cutover. As a precaution, Google provisioned a fallback instance in the source database in case we needed to roll back the migration. The migration process was smooth, requiring no changes to the application code thanks to AlloyDB’s full compatibility with PostgreSQL.

Since migrating to AlloyDB, Google have been able to confidently segment our read traffic into read pools so that user activity can grow unabated. Because AlloyDB’s replication lag is consistently under 100 milliseconds, Google can scale reads to 20 times the capacity we had previously. This improvement has allowed us to effectively handle a surge in demand and process a larger volume of queries, leading to a substantial 150 percent increase in queries processed per second.

As a direct result, Google have seen remarkable improvements in our service, providing an exceptional user experience with outstanding uptime. With AlloyDB’s full PostgreSQL compatibility and low-lag read pools, Google had a robust foundation to continue scaling. But Google didn’t stop there.

Scaling our infrastructure with AlloyDB and Spanner

Google knew that at our rate of growth, Google would eventually run into scaling issues with our original monolith architecture.

To address this, Google identified the fastest growing part of our Django monolith and refactored it into its own standalone microservice. This allowed us to isolate the growth of this particular part of the system and manage it independently to the rest of the monolith, which had already migrated to AlloyDB.

AlloyDB now plays a crucial role in powering the system of engagement, particularly in the frontend chat where real-time performance is vital for a responsive user interface, where the data needs the highest levels of consistency and availability while the user interacts with the chatbot. However, that interactivity is mostly ephemeral and profile and reference data are relatively small and scoped in terms of our business model (e.g., one user profile per user). With this in mind, we refactored the second piece—the chat stack—to a microservice written in Golang and backed it with Spanner due to its industry-leading HA story and virtually unlimited scale. This allowed us to significantly improve the scalability and performance of our product. Spanner allows us to ingest terabytes of data per day without concern.

Google have future-proofed our chat application and are confident that it can handle period spikes in user activity and significant growth in our user base. Google have also reduced our operational costs by moving to a managed database service. Right now, our biggest cost is our opportunity cost.

Google are continuing to evolve our architecture as we grow, and looking for ways to improve scalability, performance, and reliability. We believe that by adopting architecture that leverages the strengths of both AlloyDB and Spanner, Google can build a system that can meet the needs of our users and handle our growth aspirations!

Figure 2: Current architecture

Fueling’s journey with Google Cloud

As Google Cloud users, we use a variety of products and services — from infrastructure products like VMs, K8s, TPUs and GPUs,to various managed databases and analytics, including Cloud SQL, AlloyDB, Spanner, GCS, Datastream and BigQuery. Google Cloud’s infrastructure and managed services collectively create a coherent set of tools designed to efficiently work together to provide unmatched scale and resilience.

But we’re not done–we have ambitious growth plans for Character.AI. In the short term, AlloyDB and Spanner will enable us to meet current demand without outages so often caused by the lock-wait spikes of traditional databases. In the long term, we aim to scale our platform to become one of the largest consumer products in the world.

Figure 3: Aspirational Architecture

For more, discover how Google Cloud’s AlloyDB, Spanner and other managed services help us deliver an exceptional and reliable experience to millions of users across countless use cases, from gaming and entertainment to coaching and more.