Does Training an AI Model Using Copyrighted Works Infringe the Owners’ Copyright? An Early Decision Says, “Yes.”

Alert
March 6, 2025
12 minutes

Join our mailing list Intellectual Property Transactions here to receive the latest insights directly to your inbox.

As has been widely reported, including in our year-end summary of the current state of artificial intelligence (“AI”)-related copyright litigation, AI providers are currently facing a wave of lawsuits1 from copyright owners alleging that AI models infringe their copyrights when they are trained using copyrighted works. In February, in Thomson Reuters Enterprise Centre GmbH et al v. ROSS Intelligence Inc. (Thomson Reuters v. Ross), the U.S. District Court for the District of Delaware gave an important victory to a copyright holder, which casts doubt as to whether AI providers’ principal legal defense to these infringement challenges – fair use – will shield them from liability.2

Thomson Reuters v. Ross was the first major U.S. AI-copyright decision that answered the question of whether the fair use defense protects an AI model provider from a copyright infringement claim.3 The U.S. District Court’s February 11, 2025 memorandum opinion in the case rejected the defendant’s (Ross Intelligence) fair use defense, holding that under the specific facts alleged, the defendant’s use of copyrighted data to train its AI model constituted direct copyright infringement and was not fair use. The outcome of this case may differ from other outstanding AI-copyright cases, as this case did not involve generative AI, but it is still an important decision because it laid one of the first foundational bricks in the copyright jurisprudence considering AI and fair use.

Facts of the Case

In May of 2020, Thomson Reuters (owner of Westlaw) filed a complaint against Ross Intelligence, alleging intentional copyright infringement.4 Ross had created a legal research search engine intended to be a competitor to Westlaw’s legal search engine.5 Ross’s tool was not a generative AI tool, which writes content in response to prompts, but rather an AI search engine that answers a user’s legal question by providing results in the form of relevant judicial opinions that have already been written and published.6 In order to train its new AI tool, Ross sought to use Westlaw’s headnotes (which are essentially summaries of key points of law and case holdings) and numbering system as a database of legal questions and answers.7 Westlaw owns copyrights to its headnotes and its Key Number System (which is “a numerical taxonomy” of the case law chronicled).8 Prior to training its AI tool, Ross sought licenses to Westlaw’s copyrighted content, which Thomson Reuters refused.9 Subsequently, Ross entered an agreement with LegalEase to obtain training data in the form of “Bulk Memos.”10 LegalEase’s memos were “lawyers’ compilations of legal questions with good and bad answers,” created by lawyers whom LegalEase instructed to create questions using Westlaw headnotes, but “clarifying that the lawyers should not just copy and paste headnotes directly into the questions.”11 LegalEase sold Ross approximately 25,000 Bulk Memos, which Ross then used to train its AI search tool.12 When Thomson Reuters discovered that Ross built its competing product based off of memos built from Westlaw headnotes, it sued Ross for copyright infringement.13

After a few years of discovery, in 2023, Judge Stefanos Bibas largely denied Thomson Reuter’s summary judgment motions regarding copyright infringement and fair use.14 However, in his own words, Judge Bibas “studied the case materials more closely and realized that [his] prior summary-judgment ruling had not gone far enough.”15 He chose to continue the trial in 2024 and invited both parties to renew their respective summary judgment briefings. Both parties moved for summary judgment on fair use.16

Judge Bibas’s February 2025 partial summary judgment ruling revised his 2023 decision, granting summary judgment for Thomson Reuters against all of Ross’s copyright defenses, including fair use. Specifically, he held that for certain headnotes, partial summary judgment should be granted on the direct copyright infringement claim. He denied Ross’s motions for summary judgment on direct copyright infringement and fair use.

Judicial Reasoning in 2025 Opinion

In making his determination, Judge Bibas first concluded that the Westlaw headnotes and Key Number System met the Feist minimal threshold for originality and therefore were copyrightable works.17 After determining the copyrightability of the Westlaw headnotes, Judge Bibas analyzed the materials to determine if the two prongs for direct copyright infringement were met: actual copying and substantial similarity.18 He found both by comparing the Westlaw headnotes to the LegalEase Bulk Memos and held that direct copyright infringement occurred.19

Ross relied on the affirmative fair use defense to copyright infringement, which allows for limited use of copyrighted materials without permission of the owners, based on four statutory factors:20 1) the purpose and character of the use, 2) the nature of the copyrighted work, 3) the amount of the work copied, and 4) the use’s effect on the existing and potential market.21 Judges have discretion in determining how much weight to give to each of the factors, and a party need not prevail on all of the factors in order to win a fair use determination.22

Fair Use Analysis in Thomson Reuters v. Ross

Factor 1: Purpose and Character of the Use

The first factor, the purpose and character of the use, often hinges upon how “transformative” the use is.23 Transformativeness,24 in Judge Bibas’s analysis, focuses on the purpose of the use.25 He ruled that Ross’s use was not transformative because it did not have a further purpose or different character from Thomson Reuters’s use.26 Essentially, because Ross used Westlaw headnotes as AI data to train a tool that answers a user’s legal question by showing the user relevant judicial opinions, which is similar to the purpose of Westlaw’s headnotes and numbering system, Judge Bibas held that this factor falls in favor of Thomson Reuters.27

Notably, Thomson Reuters’s headnotes did not appear in the Ross search user’s results, but rather were copied at an intermediate step, in which the headnotes were turned into numerical data, showing the relationships among legal words, which was subsequently used to train Ross’s AI.28 In past computer programming copyright cases, such intermediate copyright has been permitted as fair use, largely due to the transformative nature of the use under the first factor.29 Judge Bibas differentiated the Thomson Reuters case from computer programming cases, in which copying of code was necessary for innovation, and the works copyrighted (computer code) were functional in nature, rather than the written words present in this case.30

Factor 2: Nature of the Copyrighted Work

In determining the outcome of this prong, judges analyze the degree of creativity of the original work, granting more protection to more creative works.31 Although the bar for creativity to make a work of authorship copyrightable is very low, a work that is more creative will receive more protection under this prong.32 Although Judge Bibas found that the headnotes contain the “minimal spark of originality” needed for copyright protection, he observed that the notes were “not that creative,” and the key numbers are a factual compilation of limited creativity.33 Ultimately, even though this prong weighed in favor of Ross, this factor was not weighed as significantly as other factors, namely #1 and #4.34

Factor 3: Amount of the Copyrighted Work Used and Its Substantiality Relative to the Whole Work

In determining the outcome of this factor, judges look at how much of the whole work was copied and are more likely to find fair use if the “taken” part of the work is only a small portion of the original. For the alleged copier to succeed on this prong, the copied portion of work “must not take the ‘heart’ of the work,” meaning the most creative, significant, or memorable portion of the original work.35

Here, Judge Bibas held that this prong also weighed in favor of Ross because, although the intermediate training steps use a large amount and substantiality of the original work, the output does not. He stated, “Ross’s output to an end user does not include a West headnote.”36 He found that amount and substantiality refer to the copied work that is made available to the public, and Ross’s outputs were not copies of headnotes.37

Factor 4: Impact on the Copyrighted Work’s Value or Potential Market

The final prong of the analysis, how Ross’s use affected the copyrighted work’s value or potential market, also weighed in favor of Thomson Reuters.38 Judge Bibas considered this the most important factor in the fair use analysis for this case and held that Thomson Reuters prevailed on this prong.39 However, as discussed above, the modern consensus, post-1990, is that fair use, as an empirical matter, turns mainly upon the transformativeness of the allegedly infringing work.40 In performing the market impact analysis, judges examine both current and derivative markets for the works, as well as potential public benefits of the alleged copying.41 Here, Judge Bibas stated, “[t]he original market is obvious: legal-research platforms. And at least one potential derivative market is also obvious: data to train legal AIs.”42 He held that Ross did not put forward sufficient facts to show that such derivative markets do not exist and would not be impacted by the copying.43 The final aspect of the market-impact prong is the determination of potential public benefits. Judge Bibas held that the public interest in the subject matter, namely, law, was not enough to show a public benefit.44

Because Thomson Reuters prevailed on factors one and four, it ultimately prevailed on the overall balancing of the fair use factors.

Potential Impact on Other AI Copyright Cases

Thomson Reuters v. Ross is significant because defendants in other AI copyright cases, including OpenAI and Anthropic PBD, have sought to rebut plaintiffs’ infringement claims by stating that their use of copyrighted materials is transformative fair use because it adds new elements to the works and creates new, transformative outputs. Indeed, OpenAI, Microsoft, Bloomberg, and GitHub have asserted that their use of copyrighted materials is permissible fair use because their AI model outputs merely build upon copyrighted works, rather than replicating protected expressions.

In Thomson Reuters v. Ross, the defendant’s fair use argument failed, but future arguments could succeed as fair use is a fact-intensive analysis. In his decision, Judge Bibas stressed that this case only addresses a non-generative AI tool.45 The difference between traditional AI and generative AI is that traditional AI is good at recognizing patterns but generative AI can create new materials from those patterns.46 Fair use jurisprudence tends to rely heavily upon the first fair use factor, the purpose and character of the use.47 Because generative AI models could hypothetically create outputs that are more transformative in nature than the search engine Ross created, the outcome of the first factor analysis could fall in favor of AI platforms in litigation involving generative systems.

Additionally, in computer programming copyright cases, copying at intermediate stages of development has been permitted based on the first factor (purpose and character of the use). Future jurisprudence in the AI-copyright field could follow a similar trajectory to software-related cases. In 2023, the Supreme Court stated in Warhol, “a use that has a distinct purpose is justified because it furthers the goal of copyright, namely, to promote the progress of science and the arts, without diminishing the incentive to create.”48 Thus, under the first fair use factor, AI defendants could possibly successfully argue that it is justified to copy certain copyrighted works in order to train their models, much like copying of code was necessary for innovation in the software industry.49

The other fair use factors may also come out differently in future fair use cases with different facts. For example, in Thomson Reuters v. Ross, the allegedly infringed work was arguably barely original enough to receive copyright protection, which caused the second factor (the nature of the allegedly infringed work) to favor fair use. In future cases, the works in question may contain more protectible expression, which would disfavor fair use. On the third factor, judges in ongoing AI-copyright litigation may align with Judge Bibas in finding that the amount and substantiality of the allegedly infringed work used favors fair use because the copies made in intermediate training steps are not made available to the public and thus constitute a limited amount of copying. Finally, the fourth factor may also be analyzed differently in future cases, as the defendant’s use in Thomson Reuters v. Ross directly competed with the plaintiff’s business, and that may not be true in other cases.

In any event, Thomson Reuters vs. Ross demonstrates how fair use may be analyzed in the AI context going forward. The issuance of this memorandum opinion has already influenced other AI-copyright litigation. Specifically, music publishers suing Anthropic PBC filed a notice of supplemental authority hours after the memorandum opinion was published, in order to share the opinion with the U.S. District Court for the Northern District of California.50

Beyond the fair use defense, Judge Bibas also summarily rejected Ross’s innocent infringement, copyright misuse, merger, and scenes à faire defenses, stating “[n]one of Ross’s possible defenses holds water. I reject them all.”51 Defendant AI companies in other AI copyright suits have raised a variety of similar copyright infringement defenses. Notably, Open AI raised the scenes à faire defense (lack of copyrightability of expressions that are common to a genre) in copyright infringement cases it is defending.52

Business Takeaways

As stated in our previous article, because copyright infringement liability in relation to AI is still uncertain, parties should carefully consider how any given contract relating to AI allocates liability for potential copyright infringement. Such risks should be considered even more so after the Ross holding, as fair use is not the copyright infringement shield that some AI platforms may have been hoping for. Specifically, customers of AI service vendors should review the scope of indemnities in their service agreements carefully (noting what is and is not covered). The Thomson Reuters v. Ross ruling might offer some leverage for customers when negotiating bespoke agreements and indemnity provisions.

For companies developing proprietary software that uses AI, it would be valuable to limit the use of third-party copyrighted content in training. It is important to consider that the use of copyrighted content to train an AI model could be infringement (even if the “copying” occurs at an intermediate step, rather than at the output stage), especially if the training content is used to create a product that rivals the copyright owner’s product (which implicates fair use factor #4). Companies creating AI tools should consider each of the fair use factors in determining the type of content used to train their proprietary AI tools, and how the data is used.

Finally, it is important to remember that the fair use analysis is very fact-specific, and in the AI context, it will likely turn on the differences between purposes, type of content used, training methods, and outputs. Other AI developers may be able to distinguish their cases from the facts of this case, and the outcome of generative AI copyright cases remains uncertain, but this case suggests that fair use likely will not shield all AI provider defendants.

  1. More than 15 notable suits are pending across the country in which copyright owners are pursuing various theories of infringement against AI platforms, alleging that AI models either infringe their copyrights by training AI using copyrighted works, because the output of the AI models itself infringes, or both. See Thomson Reuters Enter. Ctr. GmbH v. ROSS Intel. Inc., No. 1:20-cv-00613-SB (D. Del. filed May 6, 2020); UAB Planner 5D v. Facebook, Inc., 534 F. Supp. 3d 1126 (N.D. Cal. 2021); Doe 1 v. GitHub, Inc., No. 4:22-cv-06823-JST (N.D. Cal. filed Nov. 3, 2022); Getty Images, Inc. v. Stability AI, Inc., No. 1:23-cv-00135-JLH (D. Del. filed Feb. 3, 2023); Tremblay v. OpenAI, Inc., No. 3:23-cv-03223, (N.D. Cal. filed June 28, 2023); In re Google Generative AI Copyright Litigation, No. 5:23-cv-03440 (N.D. Cal. filed July 11, 2023); Authors Guild v. OpenAI, Inc., No. 1:23-cv-08292 (S.D.N.Y. filed Sept. 19, 2023); Kadrey v. Meta Platforms, Inc., No. 23-cv-03417-VC (N.D. Cal. Nov. 20, 2023); Huckabee v. Bloomberg L.P., No. 1:23-cv-09152 (S.D.N.Y. filed Oct 17, 2023); Concord Music Grp., Inc. v. Anthropic PBC, No. 3:23-cv-01092 (M.D. Tenn. filed Oct. 18, 2023); Andersen v. Stability AI Ltd., No. 3:23-cv-00201 (N.D. Cal. Second amended complaint filed Oct. 31, 2024); The N.Y. Times Co. v. Microsoft Corp., No, 1:23-cv-11195, (S.D.N.Y. filed Dec. 27, 2023); Nazemian et al. v. Nvidia Corp., No. 24-01454 (N.D. Cal. filed Mar. 8, 2024).
  2. Memorandum Opinion, Thomson Reuters Enterprise Centre GmbH et al v. ROSS Intelligence Inc., Docket No. 1:20-cv-00613, 17 (D. Del. Feb. 11, 2025).
  3. Memorandum Opinion, Thomson Reuters, 1:20-cv-00613, at 17.
  4. Complaint, Thomson Reuters Enter. Ctr. GmbH v. ROSS Intel. Inc., No. 1:20-cv-00613-SB (D. Del. filed May 6, 2020).
  5. Memorandum Opinion, Thomson Reuters, 1:20-cv-00613, at 3.
  6. Id. at 17.
  7. Id. at 3.
  8. Id. at 2.
  9. Id. at 3.
  10. Id.
  11. Id.
  12. Id.
  13. Id.
  14. Id.
  15. Id.
  16. Id. at 3, 4.
  17. Id. at 7.
  18. Id.
  19. Id.
  20. Neil Weinstock Netanel, Making Sense of Fair Use, 15 (3) Lewis & Clark L. Rev 715 (2011); Rich Stim, What is Fair Use?,  Stanford Libraries, https://fairuse.stanford.edu/overview/fair-use/what-is-fair-use/. See also 17 U.S. Code § 107.
  21. Rich Stim, Measuring Fair Use: The Four Factors, Stanford Libraries, https://fairuse.stanford.edu/overview/fair-use/four-factors/#:~:text=the%20purpose%20and%20character%20of,use%20upon%20the%20potential%20market.
  22. Rich Stim, Measuring Fair Use: The Four Factors.
  23. Id.
  24. Transformativeness has become, since at least 1994, a key factor in the fair use analysis. See Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569 (1994); Netanel, Making Sense of Fair Use. See also Judge Leval’s argument in 1990 that the most critical element of the fair use analysis is the transformativeness of a work. Pierre N. Leval, Toward a Fair Use Standard, 103 Harv. L. Rev. 1105 (1990).
  25. Memorandum Opinion, Thomson Reuters, 1:20-cv-00613, at 17.
  26. Id.  
  27. Id. at 17, 18.  
  28. Id.
  29. See: Sony Comput. Ent., Inc. v. Connectix Corp., 203 F.3d 596, 599, 606–07 (9th Cir. 2000); Sega Enters. Ltd. v. Accolade, Inc., 977 F.2d 1510, 1514–1515, 1522–23 (9th Cir. 1992); Google LLC v. Oracle Am., Inc., 593 U.S. 1, 24 (2021). 
  30. Id. at 18.  
  31. Id.
  32. Id.
  33. Id.
  34. Although many judges, like Judge Bibas, do not give much importance to this factor, legal scholars and decision-makers have advocated for giving this prong more weight, especially in the context of “fact-like materials,” such as those at issue in the Thomson Reuters case.  In contrast to Judge Bibas, scholars have advocated for the importance of considering the copyrighted work’s nature, because the copying of materials that are factual in nature requires “necessary breathing room.”  In cases involving scholarly works, judges have put increased weight on the second prong of the analysis.  However, it remains commonplace for judges to put little weight on this factor. See Figares, Alex R. et. al, Copyright Infringement and the Fair Use Defense: Navigating the Legal Maze, 27(1) Univ. of Fl. J. of Law & Public Pol. 135 (2016); Authors Guild v. Google, Inc., 804 F.3d 202, 220 (2d Cir. 2015); Cambridge University Press v. Patton, 769 F.3d 1232 (11th Cir. Ga. 2014).
  35. Memorandum Opinion, Thomson Reuters, 1:20-cv-00613at 21, Rich Stim, Measuring Fair Use: The Four Factors.
  36. Memorandum Opinion, Thomson Reuters, 1:20-cv-00613, at 21.
  37. Id.
  38. Id.
  39. Id.
  40. Netanel, Making Sense of Fair Use.
  41. Memorandum Opinion, Thomson Reuters, at 22.
  42. Memorandum Opinion, Thomson Reuters, 1:20-cv-00613, at 22.
  43. Id.
  44. Id. at 22, 23.
  45. Id. at 19.
  46. Bernard Marr, The Difference Between Generative AI And Traditional AI: An Easy Explanation for Anyone, Forbes (July 24, 2023), https://www.forbes.com/sites/bernardmarr/2023/07/24/the-difference-between-generative-ai-and-traditional-ai-an-easy-explanation-for-anyone/.
  47. Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith, 589 U.S. (2023); Authors Guild v. Google, Inc., 804 F.3d 202, 220 (2d Cir. 2015).
  48. Warhol, 589 U.S. at 1276.
  49. See Google LLC v. Oracle Am., Inc., 593 U.S. 1, 24 (2021); contra Memorandum Opinion, Thomson Reuters, 1:20-cv-00613, at 18.
  50. runi Soni, Judge Rejects Fair-Use Defense in Westlaw AI Copyright Suit, Bloomberg Law (Feb. 11, 2025), https://www.bloomberglaw.com/product/blaw/bloomberglawnews/ip-law/BNA%2000000193-b7ac-d85e-adfb-f7bcd74b0001.
  51. Memorandum Opinion, Thomson Reuters, 1:20-cv-00613, at 14.
  52. Defendants’ Answer to First Consolidated Amended Complaint, Tremblay v. OpenAI, Inc., No. 3:23-cv-03223, (N.D. Cal. Aug. 27, 2024); OpenAI Defendants’ Answer to First Consolidated Class Action Complaint, Authors Guild v. OpenAI, Inc., No. 1:23-cv-08292 (S.D.N.Y. Feb. 16, 2024).