Copyright and Artificial Intelligence: Is your academic research at risk?

AI companies are quietly courting tech companies and publishers to obtain access to treasure troves of copyrighted content—including faculty-published research. One of the most noteworthy AI deals is Google’s $200 million deal with Reddit to license its users’ posts and comments to train chatbots in the “art of conversation.” News Corp—which owns the Wall Street Journal and the New York Post—also agreed to license news content to OpenAI in a five year deal worth over $250 million. Academic research is now in the crosshairs. Tech companies, like Microsoft, are paying academic publishers Wiley and Taylor & Francis for access to research publications. Microsoft’s reason? The tech company allegedly wants to use AI to improve research by working across disciplines, researching ideas, generating hypotheses, and analyzing data. If you are an author, academic researcher, musician, or artist, you need to be paying attention.

Is this Fair Use?

AI companies generally rely on two legal theories to use copyrighted content to train AI models without compensating or asking permission from copyright holders. The first theory—which deals with contractual agreements—is fairly straightforward and will be explored in further detail below. The second theory—fair use—involves a more complicated inquiry, one that courts are currently wrestling with in a number of class action lawsuits brought by music publishers, newspapers (New York Times), stock image companies (Getty Images), and subscription databases (Thomson Reuters).

Proponents of the fair use theory for AI training argue that permission isn’t necessary because fair use already allows for the copying of copyrighted content for “non-expressive purposes”—such as the copying of content for facts, data, and patterns. This type of use is seen in text and data mining and search engine activity.

Legal precedent that proponents use to justify copying of copyrighted content to train AI models includes Google Books and Hathi Trust—two cases where courts said that the copying and digitization of copyrighted books for the purpose of improving the search and accessibility functionality of online databases was a transformative fair use. Transformative fair use means using a copyrighted work in a different way or for a different purpose as the original. Similar to the transformative use of copyrighted content in Google Books and Hathi Trust, proponents of generative AI copying argue that extracting data to produce new and informative content without verbatim copying is fair use.

Opponents of the fair use theory for AI training argue that AI models can memorize copyrighted works and create almost exact replicas when generating new content. Replication is generally prohibited under both copyright law and plagiarism policies.

Is copying for generative AI purposes fair use? Is it fair use when AI models generate content that is substantially similar to copyrighted content but not verbatim? As courts grapple with these questions, we should also consider—for public policy reasons—whether these practices are just plain fair for copyright holders.

Why Your Publishing Contract Is Important

If AI companies are not relying on fair use to train AI models with copyrighted content, then they must rely on contractual agreements. In some instances, AI companies do not need permission from individual authors because the copyright or rights are actually owned by tech companies or publishers instead of the authors themselves.

Too often, authors, artists, and other content creators don’t realize that their rights in copyright have been signed away to a publisher or tech company under a terms of service agreement or a publishing contract. In many cases, these agreements or terms are not meticulously reviewed or well understood by creators. Furthermore, there is little, if any, opportunity for creators to advocate for a change in contractual terms or a workable process to do so.

For faculty and students at Appalachian who are publishing research with academic journals, publishing houses, or other venues, it is important to remember to review the terms of your publishing contract to understand who owns the copyright in your work, whether or not you are assigning the copyright to the publisher or a third party, and how your work will be used down the road by publishers or other third parties.

If you expect to receive compensation for your work, it is important to discuss with your publisher whether your work will be licensed to AI companies and whether you will be paid royalties for this specific, intended use. For existing contracts between publishers and authors, there is some uncertainty over whether publishers have the necessary consent to license content to AI companies because AI-specific use was not contemplated or specified in the contract language.

Faculty and student authors who oppose the use of their work for AI training can use a model clause similar to the one below (source: Author’s Guild):

No Generative AI Training Use.
For avoidance of doubt, Author reserves the rights, and [Publisher/Platform] has no rights to, reproduce and/or otherwise use the Work in any manner for purposes of training artificial intelligence technologies to generate text, including without limitation, technologies that are capable of generating works in the same style or genre as the Work, unless [Publisher/Platform] obtains Author’s specific and express permission to do so. Nor does [Publisher/Platform] have the right to sublicense others to reproduce and/or otherwise use the Work in any manner for purposes of training artificial intelligence technologies to generate text without Author’s specific and express permission.

Choose Open Access or Assert Your Rights in Copyright

On the other hand, some faculty and student authors may not be concerned about the use of their published research for AI purposes—viewing this use as a way to further the dissemination of knowledge and information. In this case, authors have the opportunity to publish their work open access using a Creative Commons license—a type of open access license that allows others to use and copy their works without permission under certain conditions.

For any questions regarding copyright and AI or how your research or publishing contract may be implicated in AI training, please contact Agnes Gambill West.

Additional Resources

U.S. Copyright Office, Copyright and Artificial Intelligence, Part I: Digital Replicas (USCO addresses the legal and policy issues related to artificial intelligence (AI) and copyright)

Nature.com, “AI is complicating plagiarism. How should scientists respond?”

U.S. Senate, NO FAKES Act of 2024 (Congress proposes to substantially change the protection of individual likeness to prevent fake news and the creation of deepfake videos and voice overs)

Image credit: "Artificial Intelligence & AI & Machine Learning" by mikemacmarketing is licensed under CC BY 2.0.

Humanoid robot standing in front of a blackboard full of math equations

Published:

Aug 14, 2024 8:57am

Digital Scholarship and Initiatives