Data Governance Essentials in the EU AI Act

Understanding Data and Data Governance in the EU AI Act

The European Union’s Artificial Intelligence Act (EU AI Act) proposes a framework to regulate AI, particularly for “high-risk” systems—those that could impact health, safety, or fundamental rights. A crucial element of this framework is Article 10, which focuses on data and data governance. This article mandates strict standards for the datasets used in training, validating, and testing high-risk AI systems to prevent issues like bias, errors, or discrimination.

Understanding Article 10 is vital for AI providers and stakeholders interested in AI regulation on data and data governance. This article conceptualizes the data and data governance requirements as outlined in the Act. We’ll explore what data governance means, its key elements, and its significance for compliance.

What is Data Governance in the Context of AI?

Data governance refers to the set of practices, policies, and processes that ensure data is handled ethically, accurately, and in line with ethical and legal standards. For high-risk AI systems, poor data practices can lead to amplified biases or unreliable outcomes, which is why the AI Act emphasizes governance to mitigate risks and ensure systems perform as intended.

Think of data governance as a conceptual framework:

  • It covers everything from how data is collected and prepared to how biases are detected and corrected.
  • The goal is to make AI systems not just functional, but also fair and compliant with regulations like the General Data Protection Regulation (GDPR) and others.
  • In Article 10, this governance applies specifically to training, validation, and testing datasets, ensuring they’re suitable for the AI’s purpose and free from flaws that could harm users.

The Five Pillars of Data Governance

Article 10 is structured around five main paragraphs, each building on the last to create a robust data management ecosystem. These pillars apply to datasets for high-risk AI systems, with some exceptions for non-training-based systems. Let’s explore each one.

1. Data Governance and Management Practices (Article 10(2))

Datasets must undergo appropriate governance and management practices tailored to the AI system’s intended purpose. It’s not a one-size-fits-all approach; practices should reflect the system’s design and real-world application. Key elements include:

  • Design Choices: Strategic decisions during development align the AI with its goals.
  • Data Collection Processes: Document the origins of data and how it was gathered to build trust.
  • Data Preparation Operations: Maintain high quality through tasks like annotation, cleaning, and updating.
  • Formulation of Assumptions: Clearly define what the data represents to avoid errors.
  • Assessment of Data Suitability: Evaluate if datasets are available and fit for purpose.
  • Bias Examination: Scrutinize data for biases that could affect fundamental rights.
  • Bias Mitigation: Implement measures to detect and correct biases.
  • Addressing Data Gaps and Shortcomings: Identify deficiencies that could hinder compliance.

2. Dataset Characteristics (Article 10(3))

Once governance practices are in place, the datasets themselves must meet quality benchmarks. They need to be:

  • Relevant and Sufficiently Representative: Mirror real-world scenarios to avoid skewed results.
  • Free of Errors and Complete: Minimize inaccuracies and missing values to ensure reliability.
  • Statistically Appropriate: Ensure the data’s statistical properties align with the target population.

3. Contextual Considerations (Article 10(4))

Data doesn’t exist in a vacuum. This paragraph requires datasets to be customized to the AI’s specific geographical, behavioral, functional, or contextual settings. The benefits include:

  • Promotes Fairness and Non-Discrimination: Representative data reduces biases that could disadvantage certain groups.
  • Enhances Accuracy and Integrity: Tailored data improves completeness and reliability.
  • Aligns with Legal Standards: Complies with GDPR principles.
  • Reduces Risks: Matches data to operational contexts, avoiding mismatches that could lead to failures.
  • Compliance Workflow: Assess the AI’s purpose, curate relevant data, and document decisions for ongoing bias mitigation.

4. Processing Special Categories of Personal Data (Article 10(4))

Special categories of personal data—such as health records, biometric info, or racial details—are highly sensitive. Providers can only process them exceptionally and only for bias detection and correction when absolutely necessary. Strict conditions must be met, including:

  • No viable alternative data exists for the task.
  • Technical limitations on reuse with privacy-preserving measures.
  • Effective access controls and full documentation.
  • Data must not be transferred or accessed by third parties.
  • Delete the data once the bias is fixed or the retention period ends.
  • Processing records must explain why special data was essential.

These safeguards protect fundamental rights while allowing limited use for critical improvements.

5. Testing Datasets for Non-Training Systems (Article 10(5))

Not all high-risk AI systems rely on machine learning models that “train” on data. For those that don’t, the full governance requirements apply only to testing datasets. This streamlines compliance without compromising quality for evaluation phases.

Why Does This Matter? The Bigger Picture

Article 10 isn’t just regulatory fine print; it’s a blueprint for compliance. By enforcing rigorous data governance, the EU AI Act helps prevent AI from perpetuating inequalities or causing unintended harm. For providers, compliance means investing in robust processes—resulting in AI that is more innovative, trustworthy, and market-ready.

If you’re building AI, start auditing your data practices against these pillars. As AI integrates deeper into society, remember: Great AI starts with great data governance.

More Insights

Data Governance Essentials in the EU AI Act

The EU AI Act proposes a framework to regulate AI, focusing on "high-risk" systems and emphasizing the importance of data governance to prevent biases and discrimination. Article 10 outlines strict...

EU’s New Code of Practice Sets Standards for General-Purpose AI Compliance

The European Commission has released a voluntary Code of Practice for general-purpose AI models to help industry comply with the AI Act's obligations on safety, transparency, and copyright. The AI...

EU Implements Strict AI Compliance Regulations for High-Risk Models

The European Commission has released guidelines to assist companies in complying with the EU's artificial intelligence law, which will take effect on August 2 for high-risk and general-purpose AI...

Navigating Systemic Risks in AI Compliance with EU Regulations

The post discusses the systemic risks associated with AI models and provides guidance on how to comply with the EU AI regulations. It highlights the importance of understanding these risks to ensure...

Artists Unite to Protect Music Rights in the Age of AI

More than 30 European musicians have launched a united video campaign urging the European Commission to preserve the integrity of the EU AI Act. The Stay True To The Act campaign calls for...

AI Agents: The New Security Challenge for Enterprises

The rise of AI agents in enterprise applications is creating new security challenges due to the autonomous nature of their outbound API calls. This "agentic traffic" can lead to unpredictable costs...

11 Essential Steps for a Successful AI Audit in the Workplace

As organizations increasingly adopt generative AI tools, particularly in human resources, conducting thorough AI audits is essential to mitigate legal, operational, and reputational risks. A...

Future-Proof Your Career with AI Compliance Certification

AI compliance certification is essential for professionals to navigate the complex regulatory landscape as artificial intelligence increasingly integrates into various industries. This certification...

States Lead the Charge in AI Regulation Amid Congressional Inaction

The U.S. Senate recently voted to eliminate a provision that would have prevented states from regulating AI for the next decade, leading to a surge in state-level legislative action on AI-related...