CLEAR Act: New Copyright Requirements for AI Training Data

CLEAR Act Would Establish Notice Requirements for Copyrighted Works in AI Training Data

On Tuesday, U.S. Senators Adam Schiff (D-CA) and John Curtis (R-UT) introduced the Copyright Labeling and Ethical AI Reporting (CLEAR) Act into Congress. If enacted, this bipartisan legislation aims to establish mandatory reporting requirements for companies developing artificial intelligence (AI) models trained on original works protected under U.S. copyright law. It would also create an additional cause of action for copyright owners who allege that generative AI developers failed to provide the necessary notice regarding their works.

Generative AI Developers Must Submit Notice 30 Days Before Commercial Release

The use of copyrighted works to train generative AI models has raised significant concerns among the international creative community. Numerous lawsuits have emerged, with copyright owners arguing for their exclusive reproduction rights against claims of fair use by AI developers. In late January, the Human Artistry Campaign initiated a public awareness effort against the mass harvesting of copyrighted materials, accusing generative AI developers of profiting billions from misappropriated works.

The CLEAR Act would mandate that companies developing generative AI platforms submit a detailed summary to the U.S. Copyright Office of every copyrighted work included in a training dataset. This notice must also include the Uniform Resource Locator (URL) for the dataset if it is publicly available at the time of filing. Generative AI developers would have 30 days before the commercial release of their AI platforms to file this notice.

Civil Penalties Under CLEAR Act Could Max Out at $2.5 Million

Definitions under the CLEAR Act indicate its broad scope. The bill defines artificial intelligence as “an automated system designed to perform tasks typically associated with human intelligence.” Furthermore, a generative AI model is described as a combination of computer codes and numerical values that generate content such as text, images, audio, or video.

Training datasets targeted under the CLEAR Act encompass any collection of materials, including text, images, video, audio, and other formats. Copyrighted works include any materials registered with the Copyright Office, per 17 U.S.C. § 408, and pre-1972 sound recordings protected under 17 U.S.C. § 1401.

Failure to comply with the CLEAR Act’s notice requirements could result in a private cause of action by copyright owners in U.S. district courts. Generative AI firms found liable could incur a penalty of $5,000 per instance of failed notice and face injunctive relief barring the use of the violating training data until the notice issue is resolved. Civil penalties are capped at $2.5 million, payable to the Register of Copyrights to support the Copyright Office’s operating costs. Copyright owners who succeed in their claims for defective notice would also be entitled to recover attorney’s fees and expenses.

Importance of the CLEAR Act

Senators Schiff and Curtis argue that the CLEAR Act is essential for providing legal protections for human creativity, which they view as a cornerstone of the cultural and creative economy. They emphasize the need to balance AI innovation with accountability, stating that the legislation will enhance public trust in emerging technologies and promote American creativity.