Early in 2018, Google Cloud worked with the community to democratize blockchain data via our BigQuery public datasets; in 2019, we expanded with six more datasets. Today, Google added eleven more of the most in-demand blockchains to the BigQuery public datasets, in preview. And Google making improvements to existing datasets in the program, too.

Google doing this because blockchain foundations, Web3 analytics firms, partners, developers, and customers tell us they want a more comprehensive view across the crypto landscape, and to be able to query more chains. They want to answer complex questions and verify subjective claims such as “How many NFTs were minted today across three specific chains?” “How do transaction fees compare across chains?” and “How many active wallets are on the top EVM chains?” 

Having a more robust list of chains accessible via BigQuery and new ways to access data will help the Web3 community better answer these questions and others, without the overhead of operating nodes or maintaining an indexer. Customers can now query full on-chain transaction history off-chain to understand the flow of assets from one wallet to another, which tokens are most popular, and how users are interacting with smart contracts. 

Chain expansion

Here are the 11 in-demand chains we’re adding into the BigQuery public datasets:

  1. Avalanche
  2. Arbitrum
  3. Cronos
  4. Ethereum (Görli)
  5. Fantom (Opera) 
  6. Near 
  7. Optimism
  8. Polkadot
  9. Polygon Mainnet 
  10. Polygon Mumbai 
  11. Tron

Google also improving the current Bitcoin BigQuery dataset by adding Satoshis (sats) / Ordinals to the open-source blockchain-ETL datasets for developers to query. Ordinals, in their simplest state, are a numbering scheme for sats. 

Google Cloud managed datasets 

Google want to provide users with a range of data options. In addition to community managed datasets on BigQuery, Google are creating first party Google Cloud managed datasets that offer additional feature capabilities. For example, in addition to the existing Ethereum community dataset (crypto_ethereum), Google created a Google Cloud managed Ethereum dataset (goog_blockchain_ethereum_mainnet.us) which offers a full representation of the data model native to Ethereum with curated tables for events. Customers that are looking for richer analysis on Ethereum will be able to access derived data to easily query wallet balances, transactions related to specific tokens (ERC20, ERC721, ERC1155), or interactions with smart contracts. 

Google want to provide fast and reliable enterprise-grade results for our customers and the Web3 community. Here’s an example of a query against the goog_blockchain_ethereum_mainnet.us dataset:

Let’s say Google want to know “How many ETH transactions are executed daily (last 7 days)?”

SELECT DATE(block_timestamp) as date, COUNT(*) as txns  FROM `bigquery-public-data.goog_blockchain_ethereum_mainnet_us.transactions`
WHERE DATE(block_timestamp) > DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
GROUP BY 1
ORDER BY 1 DESC;
SELECT DATE(block_timestamp) as date, COUNT(*) as txns  FROM `bigquery-public-data.crypto_ethereum.transactions`
WHERE DATE(block_timestamp) > DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
GROUP BY 1
ORDER BY 1 DESC;

On the results above you can see how using the goog_ dataset is faster and consumes less slot time, while also remaining competitive in terms of bytes processed.

More precise data 

Google gathered feedback from customers and developers to understand pain points from the community and heard loud and clear that features such as numerical precision are important for more accurately calculating the pricing of certain coins. We are improving the precision of the blockchain datasets by launching UDF for better UNIT256 integration and BIGNUMERIC1 support. This will give customers access to longer decimal digits for their blockchain data and reduce rounding errors in computation.  

Making on-chain data more accessible off-chain

Today, customers interested in blockchain data must first get access to the right nodes, then develop and maintain an indexer that transforms the data into a queryable data model. They then repeat this process for every protocol they’re interested in. 

By leveraging our deep expertise in scalable data processing, we’re making on-chain data accessible off-chain for easier consumption and composability, enabling developers to access blockchain data without nodes. This means that customers can access blockchain data as easily as they would their own data. By joining chain data with application data, customers can get a complete picture of their users and their business.

Lastly, Google have seen this data used in other end user applications such as Looker and Google Sheets. 

Building together 

For the past five years, Google have supported the community through our public blockchain dataset offering, and Google will continue to build on these efforts with a range of data options and user choice — from community-owned to Google managed high-quality alternatives and real-time data. We’re excited to work with partners who want to distribute public data for developers or monetize datasets for curated feeds and insights. Google also here to partner with startups and data providers who want to build cloud-native distribution and syndication channels unique to Web3.