Courts Decline to Short-Circuit AI Copyright Claims
Two recent decisions by the District Court for the Southern District of New York offer guidance on how courts are approaching copyright claims against generative AI companies. While the cases arise in different contexts and involve different AI products, they share a common throughline: at least at the pleading stage, courts are unwilling to treat AI-generated outputs as categorically non-infringing or resolve complex similarity and causation questions as a matter of law.
Case Analysis: David Baldacci et al. v. OpenAI Inc.
In the case of David Baldacci et al. v. OpenAI Inc., the District Court denied OpenAI’s attempt to dismiss authors’ claims that ChatGPT’s outputs infringe copyrighted books by generating summaries, sequels, and other derivative-style content that allegedly tracks protected elements of original works. OpenAI argued that the complaint failed because it did not attach verbatim outputs and because summaries or outlines, by their nature, cannot be substantially similar to full-length literary works. The District Court disagreed on both points.
The Court held that the alleged outputs were sufficiently incorporated into the complaint by reference, permitting the court to consider them even though they were not physically appended. More importantly, the court emphasized that substantial similarity—particularly where works contain both protectable and unprotectable elements—is a fact-intensive inquiry ill-suited for resolution at the pleading stage. The Court reasoned that even condensed or abridged outputs may still appropriate protected expression, including character development, narrative structure, and specific plot elements. When an AI-generated summary or outline captures the “selection and arrangement” of these elements, a reasonable jury could find infringement.
Case Analysis: Advance Local Media LLC v. Cohere Inc.
This logic was reiterated in a more recent decision in Advance Local Media LLC v. Cohere Inc., where major news publishers alleged that Cohere’s AI system produces verbatim reproductions, close paraphrases, or “substitutive summaries” of copyrighted news articles—outputs that publishers allege function as replacements for the original works rather than mere factual references.
Similar to OpenAI, Cohere sought to dismiss the claims brought by the publishers. Cohere argued that summaries are inherently transformative and that any overlap reflected uncopyrightable facts. The District Court rejected this framing and found that while facts themselves are not protected, the original presentation of those facts may be. At the pleading stage, allegations that an AI system closely tracks phrasing, structure, and stylistic choices are sufficient to plausibly allege substantial similarity. Notably, the court declined to adopt rigid quantitative thresholds (such as percentage-of-text copied) as dispositive. Instead, it reiterated that qualitative significance matters, and even partial copying may be infringing if it captures the “heart” of the work.
Skepticism Toward Dismissals
Both decisions reflect judicial skepticism toward efforts to dismiss secondary liability claims based on the argument that AI tools are merely general-purpose technologies. In the Cohere case, the court credited allegations that the system was designed and marketed to retrieve and deliver news content, including features that expose full articles to users. Those design choices, combined with allegations of knowledge and continued operation despite notice, were sufficient to plead contributory and inducement-based theories of infringement. Similarly, in the OpenAI case, the court emphasized that questions about how and why an AI model generates particular outputs—whether through memorization, training effects, or prompt-driven reconstruction—cannot be resolved without a factual inquiry.
Implications for Copyright Claims
These decisions establish that courts are not prepared to short-circuit copyright disputes in cases where plaintiffs allege that AI outputs replicate protected expression in commercially meaningful ways. For rightsholders, these decisions highlight the importance of anchoring claims with specific output examples and pleading coherent theories of market harm, particularly where AI-generated content may function as a substitute for the copyrighted works themselves.
As AI copyright litigation continues to evolve, these early decisions suggest that the most consequential legal battles will be fought on fuller factual records that test how generative systems actually behave in practice and compete with original works.