Revised Guidelines for Copyright Compliance in AI Models

GPAI Code of Practice – Third Draft: Changes to the Requirements for Copyright Compliance

Created Date: March 17, 2025 1:47 PM

Background

Under the European AI Act (Regulation (EU) 2024/1689, “AI Act”), providers of General-Purpose AI (“GPAI”) models, such as models of the GPT family, Llama, or Gemini, must comply with certain requirements, including documentation and policy implementation to adhere to EU copyright law.

To facilitate compliance with these requirements, the AI Act anticipates the creation of Codes of Practice for the use of GPAI models. In response to an invitation from the AI Office, various experts and stakeholders established four working groups to draft the initial Code of Practice. Should the EU Commission approve this Code of Practice, it will acquire “general validity” within the EU. By adopting the approved GPAI Code of Practice, companies can demonstrate proactive compliance, potentially avoiding regulatory scrutiny and penalties.

The AI Office has published the working groups’ third draft of the Code of Practice (“3rd Draft”), covering the following topics:

Commitments
Transparency
Copyright
Safety and Security

The final version of the Code of Practice is scheduled for May 2, 2025.

Who is this Relevant For?

The Code of Practice is primarily relevant for providers of GPAI models. GPAI models exhibit significant generality and can competently perform a wide range of distinct tasks. This includes providers of well-known large language models such as GPT (OpenAI), Llama (Meta), Gemini (Google), or Mistral (Mistral AI). Smaller model providers may also be affected as long as their models can be utilized for a broader range of tasks. Additionally, businesses that fine-tune models for their own purposes may become GPAI model providers.

Furthermore, “downstream providers,” i.e., businesses that implement GPAI models into their AI systems, should familiarize themselves with the Code of Practice. This Code may evolve into a quasi-standard for GPAI models, influencing what AI system developers can expect from a GPAI model, which is crucial when negotiating contracts with GPAI model providers.

Key Concepts of the Code of Practice on Copyright Law

Providers of GPAI models are mandated to establish a policy to comply with EU copyright law (Art. 53 (1) (c) AI Act). Given the absence of prior similar requirements, practical guidance on what such a policy should entail is currently lacking. The Code of Practice aims to bridge this gap.

The Code of Practice requires providers to implement the following measures:

Copyright Policy

Providers signing the Code of Practice (“Signatories”) must draft, maintain, and enforce a copyright policy that ensures compliance with EU copyright law. This requirement is directly stipulated under the AI Act. Signatories are also responsible for ensuring adherence to the copyright policy within their organization.

An important change under the 3rd Draft, compared to the 2nd Draft, is that Signatories are no longer obligated to publish their copyright policy but are merely encouraged to do so. This adjustment aligns with the AI Act, which does not mandate model providers to publish their copyright policy.

Web Crawling of Copyrighted Content

Signatories are generally permitted to utilize web crawlers for text and data mining (“TDM”) to gather training data for their GPAI models. However, they must ensure that crawlers respect technologies that restrict access to copyrighted materials, such as paywalls.

Moreover, Signatories are required to exclude “piracy domains,” which are internet sources that profit from providing copyright-infringing materials.

Web Crawling and Identifying and Complying with TDM Opt-Outs

Signatories must ensure that web crawlers identify and adhere to a TDM opt-out declared by rightsholders. While TDM is generally permitted under EU copyright law, rightsholders have the option to declare an opt-out. For web content, this opt-out must be machine-readable. The 3rd Draft explicitly outlines that web crawlers should comply with the widely used robots.txt protocol, in addition to other relevant machine-readable TDM opt-outs established as industry standards.

Signatories are expected to take reasonable measures to inform rightsholders about the web crawlers in use and how these crawlers interact with robots.txt files. Information can be disseminated via web feeds. Notably, the 3rd Draft no longer includes an obligation to publish this information.

Identifying and Complying with a TDM Opt-Out for Non-Web-Crawled Content

GPAI model providers may also source datasets from third parties instead of applying web crawling themselves. While the 2nd Draft required a copyright due diligence for third-party datasets, the 3rd Draft necessitates reasonable efforts to ascertain whether web crawlers used to gather information complied with robots.txt protocols.

Mitigating Risk to Prevent the Production of Copyright-Infringing Output

One risk associated with AI usage is the generation of output that infringes copyrights, such as duplicating code or images that are copyright protected. Signatories must make reasonable efforts to mitigate this risk, which is a welcome adjustment compared to the 2nd Draft that mandated measures to avoid “overfitting.” The current draft is more technically neutral, emphasizing reasonable efforts.

Additionally, Signatories must include a clause in their terms and conditions (or similar documents) to prohibit copyright-infringing uses of their GPAI model by providers of downstream AI systems.

Designating a Point of Contact

Signatories are required to provide a point of contact for rightsholders and establish a mechanism to allow them to submit complaints regarding copyright infringements. Under the 3rd Draft, Signatories may decline to process complaints deemed unfounded or excessive.

Conclusion and Recommendations for Businesses

The 3rd Draft introduces reasonable changes compared to the 2nd Draft, facilitating compliance with the Code of Practice. This is likely to enhance the practicality of using the Code of Practice for compliance with the AI Act.

However, it is crucial to recognize that the Code of Practice remains a draft and may undergo substantial changes. The approval of the final Code of Practice by the EU Commission is likely but not guaranteed.

The working groups will accept feedback from stakeholders until March 30, 2025, and aim to present a final version in May 2025.