Today, Google announce the Dataplex business glossary, now available in public preview. Dataplex is an intelligent data fabric that provides a way to manage, monitor, and govern your distributed data at scale. Dataplex business glossary offers users a cloud-native way to maintain and manage their business terms and definitions, establishing consistent business language, improving trust in data, and enabling self-serve use of data.

In enterprises, small, medium, or large, there are many different teams. Each team, over time, develops its own language. For example, for a corporate team, “customer” could mean “legal entity,” whereas for the central platform team, it could be individuals/legal entities/government entities, etc. This dissonance can lead to collaboration challenges and, worse, to misinterpretation of data and affect insights and decisions. This dissonance also prevents users unfamiliar with the area from having a self-service path, leaving them dependent on tribal knowledge in the organization. Navigating it introduces manual overhead and makes it hard to stay updated with changes. 

With Dataplex business glossary, users can now :

  • capture their business terminology within glossaries and terms
  • enrich cataloged data entries with this business terminology by attaching defined terms to data entry columns
  • describe semantic relationships between terms by establishing cross-term associations.

Dataplex business glossary supports data practitioners in several ways. Firstly, it promotes semantic consistency in defining and interpreting data across teams, which helps to minimize redundancy and reduce the possibility of confusion and misinterpretation when consuming data. For example, with a centrally curated definition of the term ‘retail transaction,’ when two teams produce two different data assets capturing details of retail transactions, they would structure these data assets consistently according to the defined terminology.

Semantic consistency,in turn, reinforces understanding of and trust in data. When attached to data assets, glossary terms provide an additional layer of centrally curated and consistent business context that allows users to confidently establish  the degree to which the data assets fit for their purpose. In the above example of customer data, an analyst searching for “show me all customer tables” does not have to worry about varying interpretations for identified data assets, i.e., whether they refer to personal customers or legal entities, etc. With business glossary, the correct interpretation is established via associated glossary terms which provide the required context for these data assets and allow the analyst to identify the relevance of discovered data more reliably.

All the above then unlocks self-serve use of data, allowing users to leverage glossary content to discover data assets (e.g. through search queries like “Show me all entries which attached glossary terms referencing ‘retail transaction’ anywhere in their definitions” – note how search can address varying term metadata, including descriptions and associated Data Stewards, when identifying data assets), understand the semantics of these data assets, and consequently – identify applicable usage scenarios for these data assets.

Additionally, Dataplex business glossary can support data governance, with data governance teams using glossary context for informing data governance policy configuration decisions. For example, these teams can consider data assets associated with glossary terms referencing “customer” for additional access control policies related to customer data handling. 

In summary, you can leverage the Dataplex business glossary alongside  the broad set of Dataplex data governance capabilities to enable users to establish a common and consistent business language, strengthen trust in data, promote self-serve use, and get value from your data.

How do I get started?

To get started with Dataplex business glossary, visit the Glossaries tab in Dataplex. You can capture business terminology by defining glossaries, terms, and cross-term relationships. 

You can then associate cataloged entries with defined terms as you browse data entries in Dataplex Search.

Once glossary content is defined and associated with data entries, you can leverage glossary content in discovery and search.