Article
PayPal's historically large data migration is the foundation for its gen AI innovation
With the dawn of the gen AI era, businesses are facing unprecedented opportunities for transformative products, demanding a strategic shift in their technology infrastructure. A few years ago, PayPal, a digital-native company serving hundreds of millions of customers, faced a significant challenge. After 25 years of success in expanding services and capabilities, we’d created complexity in our data analytics infrastructure. Some 400 petabytes of data was spread across a dozen siloed systems due to limitations of scale and acquisitions of companies like Venmo, Braintree, and others.
Our very success in growth and innovation had created complexity that threatened our next evolution.
To continue leading the next wave of innovation in financial services, Google Cloud knew they had to modernize their data foundation. Today, Google Cloud is proud to share how PayPal successfully completed what’s arguably one of the largest data migrations in history, culminating with the move of Google Cloud analytics to BigQuery, Google Cloud’s enterprise data warehouse. This effort marks a significant leap in creating the robust data framework we’ll need to expand and advance our business priorities and meet the ever-evolving financial needs of our customers. This migration was essential, but the scale was daunting. In fact, by some measures, such as our now sunset Teradata system, Google Cloud believe this was one of the biggest data migrations in history. Befitting of such history, Google Cloud wanted to offer some insights into how Google Cloud tackled this migration and what others might consider when undertaking a significant migration of their own.
Untapped potential of data
As one of the original digital payment pioneers, PayPal processes billions of transactions, and houses decades of valuable customer insights. Google Cloud have a mountain of data — really a mountain range — that had developed over decades without being fully leveraged in the service of our customers and merchants.
Each acquisition and new service added valuable capabilities but also introduced new data challenges. For example, a small business owner might use PayPal for online sales and Venmo for local transactions. However, providing a unified view of their business required complex processes that were costly and slow.
The fragmentation of data limited our ability to offer personalized experiences to consumers, thereby reducing the potential to maximize the value of their money and hindering our ability to gain deeper insights from the data.
As the gen AI era dawned, Google Cloud digital fragmentation was becoming more than just a technical inconvenience. With AI becoming a transformative force in financial services with huge potential ROI, Google Cloud knew fragmented data would severely limit our ability to create the intelligent experiences customers have come to expect. These could run from further strengthening Google Cloud industry-leading fraud detection models to providing a best-in-class commerce platform for merchants to help them succeed in the competitive global economy.
To get there, Google Cloud had to get their disparate data platforms in order, first.
Legacy systems, modern ambitions
The scope was massive. Google Cloud needed to consolidate multiple data platforms, including what’s believed to be the world’s largest Teradata deployment, along with Hadoop clusters, Redshift, Snowflake, and various other systems processing petabytes of transaction data. This migration also had to be executed while maintaining the uninterrupted security and reliability our customers depend on.
As a technology company, PayPal has considerable internal resources, so we first had to decide whether to tackle this challenge ourselves. Google Cloud weighed the costs and benefits and decided that if Google Cloud were to unify and scale our on-premise infrastructure to meet their future needs, the cost and time-to-complete would have been prohibitive. Plus, the innovations in AI were happening at a rapid pace in the cloud. To truly leverage the power of our data, we needed to be where that innovation is happening.
Google Cloud assessed various data warehousing solutions and chose BigQuery due to its numerous advantages. It is a fully managed, cloud native platform with disaggregated compute and storage that can scale independently. It has powerful capabilities at the scale and performance Google Cloud needed, and a familiar SQL interface meant a gentler learning curve for Google Cloud developer community.
Most importantly, BigQuery’s native integrations with AI enable seamless and efficient data analytics.
The journey to unified data
After choosing Google Cloud as our data partner, we embarked on our historic data migration. This may sound hyperbolic, but when you consider the scale of PayPal’s business, the geographies across which we operate, the regulations within each, the sensitive and quite literally valuable nature of this data, the scope of the challenge starts to be clear.
With the help of partners and experts from Google Cloud Consulting, PayPal migrated more than 300 petabytes of data and streamlined operations, decommissioning around 25% of workloads. And PayPal managed this all while maintaining zero downtime of their business operations and with no impact to customers. Here are some key factors that contributed to our success.
Alignment: The first hurdle in achieving transformations at scale is aligning stakeholders on a shared goal. So, PayPal made it an enterprise-wide priority.
Discovery and analysis: Detailed inventories of data, workloads and inbound/outbound data streams is crucial for defining scope, effort and forecasting budget. Establishing lineage allowed us to trace the origins and relationships of various components, thereby providing a clear and comprehensive view of the dependency graphs.
Strategy: It is crucial to establish fundamental principles for the migration process, such as deciding between lift-and-shift versus modernization, defining security principles, setting governance guardrails, and determining how consumption will be tracked.
Execution: PayPal automated every possible task and developed live dashboards to continuously monitor the progress of migrations. FinOps was integrated through the migration process with clear visibility of consumption and performance.
Benefits from BigQuery and beyond
PayPal achieve faster insights. Queries are 2.5x to 10x faster, including complex queries used by data scientists. This unlocks real-time insights, enabling PayPal to personalize product recommendations, offers, and customer support.
Google Cloud built new AI foundations. Data accessible for model training is 16x fresher. Feature engineering, a crucial step in AI development, is improved by instant access to clean, governed data. This accelerates the development personalized financial guidance, and predictive analytics for both consumers and businesses.
Google Cloud optimized operations. By migrating to BigQuery Data infrastructure vendors were reduced from four to one, streamlining operations and reducing complexity. Data duplication between platforms was entirely eliminated.
Google Cloud new unified data platform in BigQuery has become the source for PayPal's next wave of innovation, enabling us to create more intuitive, personalized experiences across our entire ecosystem and to leverage the power of gen AI.
AI-powered innovation unleashed
Looking ahead, PayPal exploring how this unified data platform will enable us to deliver AI-powered experiences that weren't possible before, including:
- Predictive fraud prevention that spots potential issues before they affect our customers.
- Personalized financial insights that help merchants optimize their businesses.
- Seamless payment experiences that adapt to each customer's preferences and patterns.
- More intelligent risk assessment that could help expand financial access to underserved communities.
Agentic commerce and future possibilities we are now able to imagine.
Lessons for the AI era
While PayPal migration may be extraordinary in its scale, we are not alone in our needs or ambitions. There are ample considerations for companies within and well beyond financial services who may be pondering their own data foundations at this time.
First off, do not underestimate how under-utilized your data may be, and how unorganized.
Making sure your data is centralized, accurate, and consistent paves the way for AI experimentation and deployment. Organizations that spend time cleaning up their data fabric will be able to bring machine learning and generative AI applications to market more quickly, and do so at scale.
Second, ensuring data is accessible to everyone within your organization, with the proper controls, unlocks so much potential. Data orchestration and enterprise search, coupled with generative AI, has the potential to break down longstanding organizational silos and speed up decision-making across your organization. It’s one of the most promising applications of AI.
The financial world will continue to evolve, driven by new technologies and changing customer expectations. PayPal’s data transformation shows how even established companies can reinvent themselves to stay ahead of this change — provided they're willing to tackle the fundamental challenges that stand in their way.
In doing so, Google Cloud has not only preserved their position as a digital payments pioneer but set ourselves up to continue leading the next wave of innovation in digital commerce.
Related News
Controlling your BigQuery costs
See Detail
Load balancing Google Cloud VMware Engine with Traffic Director
See Detail
Scale enterprise search and agent adoption with Google Agentspace
See Detail