Automatic data risk management for BigQuery using DLP

Protecting sensitive data and preventing unintended data exposure is critical for businesses. However, many organizations lack the tools to stay on top of where sensitive data resides across their enterprise. It’s particularly concerning when sensitive data shows up in unexpected places – for example, in logs that services generate, when customers inadvertently send it in a customer support chat, or when managing unstructured analytical workloads. This is where Automatic Data Loss Prevention (DLP) for BigQuery can help.

Data discovery and classification is often implemented as a manual, on-demand process, and as a result happens less frequently than many organizations would like. With a large amount of data being created on the fly, a more modern, proactive approach is to build discovery and classification into existing data analytics tools. By making it automatic, you can ensure that a key way to surface risk happens continuously – an example of Google Cloud’s invisible security strategy. Automatic DLP is a fully-managed service that continuously scans data across your entire organization to give you general awareness of what data you have, and specific visibility into where sensitive data is stored and processed. This awareness is a critical first step in protecting and governing your data and acts as a key control to help improve your security, privacy, and compliance posture.

In October of last year, Google announced the public preview for Automatic DLP for BigQuery. Since the announcement, the customers have already scanned and processed both structured and unstructured BigQuery data at multi-petabyte scale to identify where sensitive data resides and gain visibility into their data risk. That’s why Google are happy to announce that Automatic DLP is now Generally Available. As part of the release they’ve also added several new features to make it even easier to understand your data and to make use of the insights in more Cloud workflows. These features include:

Easy to understand dashboards give a quick overview of data in BQ
Granular settings for how often data is scanned
Deep native integration into Chronicle helps speed up detection and response

Managing data risk with data classification

Examples of sensitive data elements that typically need special attention are credit cards, medical information, Social Security numbers, government issued IDs, addresses, full names, and account credentials. Automatic DLP leverages machine learning and provides more than 150 predefined detectors to help discover, classify, and govern this sensitive data, allowing you to make sure the right protections are in place. 

Once you have visibility into your sensitive data, there are many options to help remediate issues or reduce your overall data risk. For example, you can use IAM to restrict access to datasets or tables or leverage BigQuery Policy Tags to set fine-grained access policies at the column level. Google’s Cloud DLP platform also provides a set of tools to run on-demand deep and exhaustive inspections of data or can help you obfuscate, mask, or tokenize data to reduce overall data risk. This capability is particularly important if you’re using data for analytics and machine learning, since that sensitive data must be handled appropriately to ensure your users’ privacy and compliance with privacy regulations.

How to get started

Automatic DLP can be turned on for your entire organization, selected organization folders, or individual projects. To learn more about these new capabilities or to get started today, open the Cloud DLP page in the Cloud Console and check out the documentation.

Related posts

Bring AI to Looker with the Machine Learning Accelerator

by Kartika Triyanti
7 months ago

Google Cloud Text-to-Speech API now supports custom voices

by Kartika Triyanti
2 years ago

How to easily migrate your on-premises firewall rules to Cloud Firewall policies

by Cloud Ace Indonesia
5 months ago