Data Governance Vocabulary
Data catalog, data lineage, data stewardship, data quality dimensions, governance policy language, GDPR data management vocabulary, and compliance communication.
- Data Governance /ˈdeɪtə ˈɡʌvərnəns/
The policies, processes, roles, and technologies used to ensure data is accurate, available, consistent, secure, and used in compliance with regulations — the framework for treating data as a managed organisational asset.
"Our data governance framework defines ownership (who is accountable for each data domain), stewardship (who maintains quality), access policies (who can read/write, under what conditions), and retention (how long data is kept and when it is deleted to meet GDPR requirements)."
- Data Catalog /ˈdeɪtə ˈkætəlɒɡ/
A centralised inventory of an organisation's data assets — datasets, tables, columns, their business definitions, ownership, data quality status, and usage — making data discoverable and enabling self-service analytics.
"The data catalog in Alation describes all 1,400 tables across our data warehouse: business glossary definition, data owner, last refresh time, data quality score, and sample queries. Data scientists spend 60% less time hunting for datasets since adoption."
- Data Lineage /ˈdeɪtə ˈlɪniɪdʒ/
The documented and/or automatically tracked chain showing where data originated, how it was transformed, and where it was consumed — enabling root-cause analysis of data quality issues and demonstrating compliance traceability.
"Data lineage in Apache Atlas shows the customer_revenue field flows from the raw orders table → order enrichment pipeline → revenue aggregation model → finance report. When the VP Finance found discrepancies, we traced the issue in 10 minutes to a currency conversion bug in the enrichment pipeline."
- Data Steward /ˈdeɪtə stjuːərd/
A business-side role responsible for ensuring data quality, managing definitions in the data catalog, approving access requests, and implementing governance policies for a specific data domain — the operational owner of data governance.
"The Customer Data domain steward is the CRM manager. She approves all new access requests to PII tables, maintains the business glossary definitions for customer attributes, and runs the quarterly data quality review for the customer 360 dataset."
- Data Quality Dimensions /ˈdeɪtə ˈkwɒlɪti dɪˈmenʃənz/
The standard attributes used to measure and communicate data quality: accuracy (values match reality), completeness (no missing values), consistency (values match across systems), timeliness (data is current), and uniqueness (no duplicates).
"Our quarterly data quality report covers 5 dimensions for each data domain. Customer data: accuracy 97% (verified against CRM source), completeness 92% (8% of email fields null), timeliness 99% (SLA: refreshed within 4 hours). The consistency dimension is Red — 3% of customer IDs don't match between CRM and billing."
- Data Masking / Anonymisation /ˈdeɪtə ˈmɑːskɪŋ ənˌɒnɪmaɪˈzeɪʃən/
Techniques for protecting sensitive data by replacing real values with realistic but fictitious values (masking) or irreversibly removing identifying information (anonymisation), enabling data to be used for testing or analytics while protecting privacy.
"Production data is masked before loading to the development and staging environments. Name and email are replaced with synthetic values using Faker. Card numbers are masked to last-4 format. After anonymisation, a re-identification risk assessment is required before the dataset can be shared externally under GDPR recital 26."
- Data Owner /ˈdeɪtə ˈoʊnər/
The senior business executive accountable for a data domain — responsible for data classification decisions, access governance sign-off, and ensuring the domain meets quality and regulatory requirements. Distinct from the Data Steward who performs day-to-day governance tasks.
"The Data Owner for the HR domain is the Chief People Officer. She approves all exceptions to the standard access policy for employee data and is accountable to the DPO for GDPR compliance in the HR domain. Day-to-day governance is delegated to the HR data steward."
- GDPR Data Subject Rights /ˌdʒiː diː piː ˈɑːr/
Rights granted to EU individuals by the GDPR to control their personal data: right of access (Art. 15), rectification (Art. 16), erasure (Art. 17/'right to be forgotten'), restriction of processing (Art. 18), portability (Art. 20), and objection (Art. 21).
"When a user submits a right-to-erasure request, our process is: (1) identity verification within 24h, (2) identify all data across 14 systems using the data catalog's lineage map, (3) delete or anonymise within 30 days as required by Art. 17(1), (4) notify third-party processors within 72h, (5) log completion for DPO audit."
- Data Classification /ˈdeɪtə ˌklæsɪfɪˈkeɪʃən/
A taxonomy for categorising data by sensitivity and regulatory exposure: Public, Internal, Confidential, Restricted (or similar tiers). Classification drives access controls, encryption requirements, retention policies, and handling procedures.
"Our data classification scheme: Public (shareable externally, no controls), Internal (default for business data), Confidential (encrypted at rest and in transit, access logged, NDA required for contractors), Restricted (PII, PHI, PCI-DSS data — field-level encryption, role-based access, quarterly access review required)."
- Consent Management /kənˈsent ˈmænɪdʒmənt/
The capability to record, track, and honour individuals' consent decisions for specific types of data processing — required under GDPR Art. 6(1)(a) and for ePrivacy compliance for cookie consent.
"Consent management platform records three consent signals per user: (1) analytics cookies, (2) personalisation processing, (3) third-party marketing. When a user withdraws analytics consent, an event is published to the data pipeline that suppresses that user's events from the analytics warehouse within 48 hours."
- Data Residency /ˈdeɪtə ˈrezɪdənsi/
The requirement that specific data types remain within a defined geographic boundary (country, region, or cloud region) for regulatory compliance or contractual reasons — commonly required by national GDPR implementations and government contracts.
"Our German enterprise customer's contract requires EU data residency — all their tenant data must remain in AWS eu-central-1 and eu-west-1. The silo-model deployment ensures their data never leaves the EU region, satisfying the data residency clause in the Data Processing Agreement."
Quick Quiz — Data Governance Vocabulary
Test yourself on these 11 terms. You'll answer 10 multiple-choice questions — each shows a term, you pick the correct definition.
What does this term mean?