AI Compliance: Copyright Challenges in the EU AI Act

The EU AI Act and Copyright Compliance

The EU AI Act presents a significant framework for ensuring compliance with copyright laws in the realm of artificial intelligence, particularly for generative AI models like large language models (LLMs). As these models require extensive datasets for training, understanding the legal ramifications of using copyrighted content is crucial.

Challenges in Training Generative AI Models

Training generative AI models necessitates vast amounts of data, including text, images, and other content, which are often sourced through web scraping from publicly available materials. The EU AI Act emphasizes the necessity of compliance with copyright laws, particularly as they concern LLMs.

Recital 105 of the Act notes that the development and training of general-purpose AI models require access to extensive datasets. It states that any usage of copyrighted content mandates authorization from the rightsholder unless exceptions apply.

Definition of General-Purpose AI Models

The Act defines general-purpose AI models as those that are trained on large datasets exhibiting significant generality and capable of performing a wide array of tasks. Examples include:

  • ChatGPT and Google’s PaLM — known for tasks such as code generation, translation, and joke explanation.
  • Claude by Anthropic — adept at content creation, vision analysis, and addressing complex inquiries.

While the AI Act primarily addresses general-purpose AI providers, it does not exempt other developers from copyright obligations. The Digital Single Market (DSM) directive remains applicable to all users of copyrighted works, marking an initial legislative effort to tackle copyright issues stemming from AI training via web scraping.

DSM Directive Provisions on AI Training and Copyright

The DSM directive introduced a text and data mining exception to copyright protection. This exception encompasses a broad range of computational analyses, including search engine indexing and data scraping for AI training. However, the directive was enacted in 2019, prior to the emergence of generative AI tools, indicating that lawmakers may not have fully considered the implications of LLMs on copyrighted content.

Typically, web scraping of copyrighted materials for AI training is permissible under the DSM directive, provided that rightsholders have not opted out explicitly. Rightsholders can reserve their rights through machine-readable means, such as technical protocols that web crawlers can recognize.

AI Act Requirements for Copyright Compliance

Article 53 of the AI Act imposes two primary obligations on general-purpose AI providers:

  1. Implement a policy that complies with EU copyright law, particularly regarding the identification and adherence to rights reservations in the DSM directive.
  2. Publish a detailed summary of the content used for training, promoting transparency and allowing creators to verify whether their works have been utilized in training.

General-Purpose AI Code of Practice: Copyright Section

The AI Act encourages general-purpose AI providers to develop industry best practices, referred to as codes of conduct. A recent draft of the General-Purpose AI Code of Practice outlines measures to ensure copyright protection compliance under the Act. Notably, it includes a commitment to:

  • Identify and comply with rights reservations when crawling the web.

Respecting Machine-Readable Opt-Outs

Signatories of the draft code are urged to utilize crawlers that adhere to instructions specified by the Robot Exclusion Protocol. The robots.txt file is a standard tool that websites employ to regulate how web crawlers access and index their content, providing directions on which areas should not be crawled. However, it is essential to understand that robots.txt only guides compliant bots and does not prevent access to copyrighted works.

Despite robots.txt being the most commonly respected protocol, the lack of a unified standard for rights reservations complicates the landscape for general-purpose AI providers.

Types of Protocols for Opt-Out Compliance

In the discourse on compliance, protocols can be categorized into two main types:

  • Location-based protocols (e.g., robots.txt, ai.txt) apply to all content on a website.
  • Unit-based protocols enable tagging specific works with metadata that indicates the creator’s wish to opt-out of AI training.

The code encourages the identification of protocols that result from a cross-industry standard-setting process, aiming for a unified rights reservation approach.

Risks of a Unified Protocol

While a unified opt-out protocol may benefit large AI providers, it poses risks, such as limiting options for authors who may wish to utilize alternative methods for protecting their works. Furthermore, the AI Act’s copyright requirements extend extraterritorially, obligating any general-purpose AI provider entering the EU market to establish a copyright compliance policy, regardless of the training location.

In summary, as generative AI technology continues to evolve, navigating the complexities of copyright compliance under the EU AI Act will be crucial for developers and researchers alike. The ongoing development of best practices and standards will play a pivotal role in shaping the future of AI and copyright interactions.

More Insights

Revolutionizing Drone Regulations: The EU AI Act Explained

The EU AI Act represents a significant regulatory framework that aims to address the challenges posed by artificial intelligence technologies in various sectors, including the burgeoning field of...

Revolutionizing Drone Regulations: The EU AI Act Explained

The EU AI Act represents a significant regulatory framework that aims to address the challenges posed by artificial intelligence technologies in various sectors, including the burgeoning field of...

Embracing Responsible AI to Mitigate Legal Risks

Businesses must prioritize responsible AI as a frontline defense against legal, financial, and reputational risks, particularly in understanding data lineage. Ignoring these responsibilities could...

AI Governance: Addressing the Shadow IT Challenge

AI tools are rapidly transforming workplace operations, but much of their adoption is happening without proper oversight, leading to the rise of shadow AI as a security concern. Organizations need to...

EU Delays AI Act Implementation to 2027 Amid Industry Pressure

The EU plans to delay the enforcement of high-risk duties in the AI Act until late 2027, allowing companies more time to comply with the regulations. However, this move has drawn criticism from rights...

White House Challenges GAIN AI Act Amid Nvidia Export Controversy

The White House is pushing back against the bipartisan GAIN AI Act, which aims to prioritize U.S. companies in acquiring advanced AI chips. This resistance reflects a strategic decision to maintain...

Experts Warn of EU AI Act’s Impact on Medtech Innovation

Experts at the 2025 European Digital Technology and Software conference expressed concerns that the EU AI Act could hinder the launch of new medtech products in the European market. They emphasized...

Ethical AI: Transforming Compliance into Innovation

Enterprises are racing to innovate with artificial intelligence, often without the proper compliance measures in place. By embedding privacy and ethics into the development lifecycle, organizations...

AI Hiring Compliance Risks Uncovered

Artificial intelligence is reshaping recruitment, with the percentage of HR leaders using generative AI increasing from 19% to 61% between 2023 and 2025. However, this efficiency comes with legal...