AI training data might not be protected by fair use

Published March 20, 2025

US copyright laws are intended to protect the rights of authors and creators by granting them exclusive rights over their works. That way, authors and creators can control how their works are used, reproduced, and distributed. These laws are intended to encourage creativity by ensuring that authors and creators are compensated for their works. However, copyright laws are not without limits; they seek to balance the interests of authors and creators with those of the public. This is one of the reasons why copyright law codifies the common law fair use doctrine that allows a third party to make limited use of copyrighted material without permission of the copyright holder. Overall, the copyright law seeks to balance copyright protection with public benefits such as education, commentary, criticism, research, and parody.

The fair use factors

Courts apply a four-factor test to determine whether a third-party use of copyrighted material qualifies as fair use and, therefore, exempt from infringement liability:

Purpose and character of the use: Whether the use is for commercial purposes or is transformative (i.e., adding new expression, meaning, or message) or for non-profit educational purposes. Transformative uses, such as parody or criticism, are more likely to be considered fair use.
Nature of the copyrighted work: Whether the work is factual or creative. Using factual works (like news reports or scientific articles) tends to favor fair use, while using highly creative works (like novels, music, or movies) may weigh against fair use.
Amount and substantiality of the portion used: This looks at how much of the original work is used and whether the portion taken is significant. Using a small, non-central part of the work is more likely to be considered fair use. However, even a small portion might not be fair use if it is considered the “heart” of the work.
Effect of the use on the market: Whether the use competes with the original work or harms its market value. If the use could negatively affect the copyright holder’s ability to profit from their work (for example, by substituting the original work), it is less likely to be fair use.

In short, whether the use of a copyrighted work is considered fair use is a highly fact-specific analysis.

Fair use and artificial intelligence (AI) model training

A developing question is whether using copyrighted works to train AI models falls within the fair use exception to copyright protections.

AI models are generally trained through a process that involves learning patterns from data, that is, training data. While approaches may vary depending on the type of AI model, training generally involves:

Collecting and labeling training data
Training
Deployment.

For example, let’s assume one seeks to train an AI model to recognize images of cars. The data collection phase would include collecting many images of cars and labeling them consistent with some predetermined set of characteristics, e.g., color, make, and model of the car. The images are then provided to the AI model for training that essentially “teaches” the model how to match the inputs (images, in this case) to certain outputs (labels). The AI model analyzes the images and associated labels and builds connections that aid the model in recognizing that certain images correspond to certain colors, makes, and models of cars. Once trained, the model is deployed for use. A user then prompts the AI model with an image of a car the model has never seen before, and the model is expected to produce an output that identifies the color, make, and model of that car. The key to the accuracy is the quality and quantity of the training data.

There is little precedent for whether and when the use of copyrighted materials to train an AI model should fall under the fair use exception. Varying fact scenarios involving AI will continue to run ahead of established law.

A recent ruling

In what appears to be a case of first impression, in Thomson Reuters Enterprise Centre GMBH v. Ross Intelligence Inc. (1:20-cv-613-SB), Dkt. 770 (Feb. 11, 2025) (“Order”), the District Court for the District of Delaware granted partial summary judgment in favor of Thomson Reuters, finding that the fair use exception did not apply to Ross’ use of Thomson’s copyrighted material to train its AI model.

Ross sought to build an AI-driven case law research tool. The AI model was initially trained on actual case law decisions. To better focus the outputs from the model, however, Ross sought to use legal category headnotes appended by Thomson staff to case law decisions contained within Thomson’s Westlaw product to refine its AI model’s training. Westlaw refused Ross’s request to use their data, so Ross went through a third party to create training data that was based on Westlaw’s headnotes. Some of this training data was a verbatim copy of Westlaw’s headnotes.

Having found that the headnotes are copyrightable material, the court analyzed whether Ross’s use of the data constituted fair use.

Purpose and character of the use

According to the court, this factor favored Thomson because Ross’s use of the headnotes was for commercial purposes and was not sufficiently transformative. Order, pp. 16-20. First, Ross did not dispute that its use of the headnotes was commercial as Ross intended to profit from its AI model that produces answers to legal questions. Second, Ross’s use did not have a different purpose and character than Thomson’s use: “Ross was using Thomson Reuters’s headnotes as AI data to create a legal research tool to compete with Westlaw.” Id. at 17. The court found the US Supreme Court’s decision in Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith, 598 US 508 (2023), instructive:

“If an original work and a secondary use share the same or highly similar purposes, and the second use is of a commercial nature, the first factor is likely to weigh against fair use, absent some other justification for copying.” Id. at 532-33.

Thus, because Ross’ purpose was to create a competing product, the court found that the purpose and use were the same, and Ross’ use was not sufficiently transformative.

An interesting nuance, however, is that the Westlaw headnotes do not appear as part of the final product or output that Ross presents to consumers. Rather, Ross used the headnotes to train its AI model, which it described as an allowable “intermediate copying.” However, the court rejected Ross’ arguments and distinguished the cases that Ross relied on. Order, pp. 17-19. Unlike Ross’ cases, where the accused infringer was copying computer source code, the present case related to using copyrighted material to train an AI model. The court noted that “computer programs differ from books, films, and many other literary works in that such programs almost always serve functional purposes.” Id. at 18. Another distinction between the cases was that in source code copying cases, copying was found to be necessary for competition. Whereas here, Ross was using the copyrighted material to shortcut the process of training their model. “Ross took the headnotes to make it easier to develop a competing legal research tool. So Ross’s use is not transformative.” Id. at 19.

Nature of the copyrighted work

The court found that this favored Ross because Westlaw’s headnotes are more informative than they are creative. While the headnotes were sufficiently creative to justify copyrightability, the material is not that creative: “Though the headnotes required editorial creativity and judgment, that creativity is less than that of a novelist or artist drafting a work from scratch.” Id. at 20.

Amount and substantiality of the portion used

This factor also favored Ross, according to the court. Ross’ “output to an end user is a judicial opinion, not a West headnote.” Thus, the output “communicates little sense of the original.” Id. at 21.

Effect of the use on the market

Finally, the court found that the fourth fair use factor favored Thomson because “[e]ven taking all facts in favor of Ross, it meant to compete with Westlaw by developing a market substitute.” Id. at 21. And, antithetically, Ross’ actions provide no benefit to the public. While “there is a public interest in accessing the law…the public’s interest in the subject matter alone is not enough.” Id. at 21-22. In other words, the “public has no right to Thomson Reuters’s parsing of the law.” Id. at 22. “Copyrights encourage people to develop things that help society, like good legal research tools. Their builders earn the right to be paid accordingly.” Id. at 22.

Considering all factors, the court found in favor of Thomson on the fair-use issue. Id. at 23.

Takeaways

This case was the first of its kind in that it decided the applicability of the fair use doctrine when using copyrighted materials to train an AI model. However, this ruling is limited. First, it does not apply to generative AI: “Because the AI landscape is changing rapidly, I note for readers that only non-generative AI is before me today.” Id. at 19. In other words, Ross’s legal product did not generate new text. It instead used AI to search and output more relevant case law decisions in response to legal questions. Second, a decision from a higher appeals court will be required to impose a country-wide precedent. Nevertheless, this decision, while not binding in other courts, is certainly expected to be informative in the least.

It also remains to be resolved how the fair use factors will apply in cases that relate to generative AI, such as the New York Times’s suit against Microsoft and OpenAI. In that suit, ChatGPT is accused of using the New York Times’ copyrighted archives to train its large language models (generative AI) to allegedly compete with the New York Times for news.

Please note: This article reflects only the present personal considerations, opinions, and/or views of the authors, which should not be attributed to any of the authors’ current or prior law firm(s) or former or present clients.

← Back to News

AI training data might not be protected by fair use

The fair use factors

Fair use and artificial intelligence (AI) model training

A recent ruling

Purpose and character of the use

Nature of the copyrighted work

Amount and substantiality of the portion used

Effect of the use on the market

Takeaways

Written by David L McCombs

Written by Eugene Goryunov

Written by Calmann Clements

You may also like…

Winter Intellectual Property Conference 2025: a review

Jurisdiction in trademark rectification to vest with the High Court, appellate to the Registry

Takeaways from J.M. Smucker Company v. Trader Joe’s Company for brand owners regarding trade dress enforcement

Follow us

Latest news

Subscribe To Our Newsletter

You have Successfully Subscribed!