Clarifying GDPR Compliance for AI Training

CNIL Clarifies GDPR Basis for AI Training

The recent guidance from the French National Commission on Informatics and Liberty (CNIL) has provided essential clarity regarding the application of legitimate interest as a legal basis for processing personal data in the context of artificial intelligence (AI) model training. This guidance specifically addresses the use of personal data scraped from public sources, a topic that has generated considerable debate within the tech and regulatory communities.

Key Points of the Guidance

The CNIL’s insights are significant, yet they represent only one layer in a complex regulatory landscape. While this guidance helps in reducing uncertainty surrounding GDPR compliance at the training stage, it does not resolve other pertinent issues such as copyright, database rights, and post-training litigation risks.

Organisations are urged to apply structured judgment at critical moments and to maintain a well-documented position on GDPR compliance. This remains crucial for managing AI-related compliance at scale.

What the CNIL’s Guidance Clarifies

The CNIL affirms that training AI models on personal data sourced from public content can be lawful under the GDPR’s legitimate interest basis, provided that certain conditions are met. These conditions include a credible balancing of interests, demonstrable safeguards, and clear documentation.

  • Web scraping may be permissible, respecting contextual privacy expectations. Scraping should not occur from sites that actively prohibit it, such as those outlined in robots.txt, or from platforms aimed at minors.
  • Training-scale data use is not inherently unlawful. Large datasets may be necessary for effective AI development, provided that the principles of proportionality and minimisation are observed.
  • End-user benefit may favour the controller in the legitimate interest assessment. Improvements in accuracy and functionality could justify processing under legitimate interest, subject to a well-documented assessment.
  • Regurgitation risk must be addressed. The CNIL expects evidence of mitigation measures, such as prompt filtering and internal testing for memorisation.
  • Data subject rights may be respected indirectly. The CNIL allows for alternatives in situations where individual erasure or objection is difficult to implement.
  • Documentation must be prepared at the time of training. The legitimate interest assessment and mitigation planning should be complete before training begins.
  • Data Protection Impact Assessments (DPIAs) may still be expected when model training involves large-scale data scraping or special category data.

Regulatory Landscape Comparison

While the CNIL’s guidance is the most structured to date, other data protection authorities have varying levels of clarity:

  • The UK Information Commissioner’s Office (ICO) has acknowledged that existing GDPR rules, including legitimate interest, may justify AI training in some contexts but has not provided detailed implementation guidance.
  • The Irish Data Protection Commission (DPC) and Italian Garante have focused primarily on deployment-phase enforcement, particularly concerning DPIAs and transparency around profiling.
  • A consistent, pan-EU approach remains absent, creating challenges for companies navigating multiple expectations across different jurisdictions.

Legal Uncertainty Beyond GDPR

The CNIL’s guidance offers a defensible position for GDPR compliance in model training, but several legal restrictions continue to limit AI system viability, particularly in commercial settings:

  • Copyright and database law remain binding. Publicly accessible content may still be protected under copyright, and the commercial text and data mining exception can be overridden by opt-out mechanisms.
  • Contractual terms restrict access and reuse. Many platforms prohibit scraping or commercial reuse through their terms of service, which are enforceable separately from data protection laws.
  • Downstream deployment introduces new compliance layers, encompassing obligations under various regulations like the AI Act and the Digital Services Act.

Operational Priorities and Legal Positioning

For legal, privacy, and product teams, the focus should not be on reinventing governance but rather on applying structured judgment at critical points:

  • Utilise the CNIL’s guidance to reinforce existing privacy governance, integrating it into company workflows.
  • Understand that training-stage compliance does not permit commercial use; copyright and platform terms may still restrict model training.
  • Recognise that deployment remains a separate compliance layer, requiring adherence to GDPR and other applicable regulations.
  • Encourage cross-functional collaboration among privacy, legal, product, and engineering teams to facilitate rapid, practical decisions.
  • Assign internal accountability to ensure clear connections between model training decisions and privacy documentation.
  • Prepare for regulatory inconsistencies and maintain thorough documentation to defend compliance narratives.

Despite the clarity offered by the CNIL’s guidance, organisations should not consider GDPR compliance for AI training as a resolved issue. Interpretation may vary across member states, and enforcement will likely focus on end-to-end outcomes, especially in sensitive use cases.

As regulatory frameworks evolve, a well-documented position grounded in the CNIL’s guidance remains a vital tool for managing AI compliance effectively.

More Insights

Revolutionizing Drone Regulations: The EU AI Act Explained

The EU AI Act represents a significant regulatory framework that aims to address the challenges posed by artificial intelligence technologies in various sectors, including the burgeoning field of...

Embracing Responsible AI to Mitigate Legal Risks

Businesses must prioritize responsible AI as a frontline defense against legal, financial, and reputational risks, particularly in understanding data lineage. Ignoring these responsibilities could...

AI Governance: Addressing the Shadow IT Challenge

AI tools are rapidly transforming workplace operations, but much of their adoption is happening without proper oversight, leading to the rise of shadow AI as a security concern. Organizations need to...

EU Delays AI Act Implementation to 2027 Amid Industry Pressure

The EU plans to delay the enforcement of high-risk duties in the AI Act until late 2027, allowing companies more time to comply with the regulations. However, this move has drawn criticism from rights...

White House Challenges GAIN AI Act Amid Nvidia Export Controversy

The White House is pushing back against the bipartisan GAIN AI Act, which aims to prioritize U.S. companies in acquiring advanced AI chips. This resistance reflects a strategic decision to maintain...

Experts Warn of EU AI Act’s Impact on Medtech Innovation

Experts at the 2025 European Digital Technology and Software conference expressed concerns that the EU AI Act could hinder the launch of new medtech products in the European market. They emphasized...

Ethical AI: Transforming Compliance into Innovation

Enterprises are racing to innovate with artificial intelligence, often without the proper compliance measures in place. By embedding privacy and ethics into the development lifecycle, organizations...

AI Hiring Compliance Risks Uncovered

Artificial intelligence is reshaping recruitment, with the percentage of HR leaders using generative AI increasing from 19% to 61% between 2023 and 2025. However, this efficiency comes with legal...

AI in Australian Government: Balancing Innovation and Security Risks

The Australian government is considering using AI to draft sensitive cabinet submissions as part of a broader strategy to implement AI across the public service. While some public servants report...