Clarifying GDPR Compliance for AI Training

CNIL Clarifies GDPR Basis for AI Training

The recent guidance from the French National Commission on Informatics and Liberty (CNIL) has provided essential clarity regarding the application of legitimate interest as a legal basis for processing personal data in the context of artificial intelligence (AI) model training. This guidance specifically addresses the use of personal data scraped from public sources, a topic that has generated considerable debate within the tech and regulatory communities.

Key Points of the Guidance

The CNIL’s insights are significant, yet they represent only one layer in a complex regulatory landscape. While this guidance helps in reducing uncertainty surrounding GDPR compliance at the training stage, it does not resolve other pertinent issues such as copyright, database rights, and post-training litigation risks.

Organisations are urged to apply structured judgment at critical moments and to maintain a well-documented position on GDPR compliance. This remains crucial for managing AI-related compliance at scale.

What the CNIL’s Guidance Clarifies

The CNIL affirms that training AI models on personal data sourced from public content can be lawful under the GDPR’s legitimate interest basis, provided that certain conditions are met. These conditions include a credible balancing of interests, demonstrable safeguards, and clear documentation.

  • Web scraping may be permissible, respecting contextual privacy expectations. Scraping should not occur from sites that actively prohibit it, such as those outlined in robots.txt, or from platforms aimed at minors.
  • Training-scale data use is not inherently unlawful. Large datasets may be necessary for effective AI development, provided that the principles of proportionality and minimisation are observed.
  • End-user benefit may favour the controller in the legitimate interest assessment. Improvements in accuracy and functionality could justify processing under legitimate interest, subject to a well-documented assessment.
  • Regurgitation risk must be addressed. The CNIL expects evidence of mitigation measures, such as prompt filtering and internal testing for memorisation.
  • Data subject rights may be respected indirectly. The CNIL allows for alternatives in situations where individual erasure or objection is difficult to implement.
  • Documentation must be prepared at the time of training. The legitimate interest assessment and mitigation planning should be complete before training begins.
  • Data Protection Impact Assessments (DPIAs) may still be expected when model training involves large-scale data scraping or special category data.

Regulatory Landscape Comparison

While the CNIL’s guidance is the most structured to date, other data protection authorities have varying levels of clarity:

  • The UK Information Commissioner’s Office (ICO) has acknowledged that existing GDPR rules, including legitimate interest, may justify AI training in some contexts but has not provided detailed implementation guidance.
  • The Irish Data Protection Commission (DPC) and Italian Garante have focused primarily on deployment-phase enforcement, particularly concerning DPIAs and transparency around profiling.
  • A consistent, pan-EU approach remains absent, creating challenges for companies navigating multiple expectations across different jurisdictions.

Legal Uncertainty Beyond GDPR

The CNIL’s guidance offers a defensible position for GDPR compliance in model training, but several legal restrictions continue to limit AI system viability, particularly in commercial settings:

  • Copyright and database law remain binding. Publicly accessible content may still be protected under copyright, and the commercial text and data mining exception can be overridden by opt-out mechanisms.
  • Contractual terms restrict access and reuse. Many platforms prohibit scraping or commercial reuse through their terms of service, which are enforceable separately from data protection laws.
  • Downstream deployment introduces new compliance layers, encompassing obligations under various regulations like the AI Act and the Digital Services Act.

Operational Priorities and Legal Positioning

For legal, privacy, and product teams, the focus should not be on reinventing governance but rather on applying structured judgment at critical points:

  • Utilise the CNIL’s guidance to reinforce existing privacy governance, integrating it into company workflows.
  • Understand that training-stage compliance does not permit commercial use; copyright and platform terms may still restrict model training.
  • Recognise that deployment remains a separate compliance layer, requiring adherence to GDPR and other applicable regulations.
  • Encourage cross-functional collaboration among privacy, legal, product, and engineering teams to facilitate rapid, practical decisions.
  • Assign internal accountability to ensure clear connections between model training decisions and privacy documentation.
  • Prepare for regulatory inconsistencies and maintain thorough documentation to defend compliance narratives.

Despite the clarity offered by the CNIL’s guidance, organisations should not consider GDPR compliance for AI training as a resolved issue. Interpretation may vary across member states, and enforcement will likely focus on end-to-end outcomes, especially in sensitive use cases.

As regulatory frameworks evolve, a well-documented position grounded in the CNIL’s guidance remains a vital tool for managing AI compliance effectively.

More Insights

AI Regulations: Comparing the EU’s AI Act with Australia’s Approach

Global companies need to navigate the differing AI regulations in the European Union and Australia, with the EU's AI Act setting stringent requirements based on risk levels, while Australia adopts a...

Quebec’s New AI Guidelines for Higher Education

Quebec has released its AI policy for universities and Cégeps, outlining guidelines for the responsible use of generative AI in higher education. The policy aims to address ethical considerations and...

AI Literacy: The Compliance Imperative for Businesses

As AI adoption accelerates, regulatory expectations are rising, particularly with the EU's AI Act, which mandates that all staff must be AI literate. This article emphasizes the importance of...

Germany’s Approach to Implementing the AI Act

Germany is moving forward with the implementation of the EU AI Act, designating the Federal Network Agency (BNetzA) as the central authority for monitoring compliance and promoting innovation. The...

Global Call for AI Safety Standards by 2026

World leaders and AI pioneers are calling on the United Nations to implement binding global safeguards for artificial intelligence by 2026. This initiative aims to address the growing concerns...

Governance in the Era of AI and Zero Trust

In 2025, AI has transitioned from mere buzz to practical application across various industries, highlighting the urgent need for a robust governance framework aligned with the zero trust economy...

AI Governance Shift: From Regulation to Technical Secretariat

The upcoming governance framework on artificial intelligence in India may introduce a "technical secretariat" to coordinate AI policies across government departments, moving away from the previous...

AI Safety as a Catalyst for Innovation in Global Majority Nations

The commentary discusses the tension between regulating AI for safety and promoting innovation, emphasizing that investments in AI safety and security can foster sustainable development in Global...

ASEAN’s AI Governance: Charting a Distinct Path

ASEAN's approach to AI governance is characterized by a consensus-driven, voluntary, and principles-based framework that allows member states to navigate their unique challenges and capacities...