Artificial Intelligence, IP Protection, IP Registrations, IP Transactions

AI Litigation Highlights: Anthropic PBC

AI Litigation Highlights: Anthropic PBC

Written by: Eric Goldman

Large language model (LLM) generative artificial intelligence (AI) systems are becoming more and more common in all aspects of life. These systems have the ability to respond to queries from users and generate responses. Who hasn’t had the experience of calling a customer service telephone line and being prompted by an automated system to answer a series of questions, in the hope of solving a problem without involving a living, breathing representative? Chat services on company websites, which are often the first line of communication between a company and the public, are also AI-powered LLMs designed to mimic the conversation styles of real people.

Simply put, LLMs are here to stay, and we will all be exposed to AI-generated output more and more frequently in a number of areas.

As with any disruptive new technology, LLMs are creating economic winners and losers – which inevitably leads to litigation. Companies such as Meta and Microsoft are already being served with lawsuits claiming that their AI products and services are violating various rights of interested parties. Through a series of articles, the lawyers at IBL will discuss several of these cases, the claims being brought, the rights alleged to be violated, and the defenses being offered.

In the first of these articles, we will focus on two lawsuits being brought against Anthropic PBC. 

1.    Who is Anthropic PBC?

Anthropic PBC is a public benefit corporation founded in 2021 by former employees of OpenAI, including Daniela Amodei and Dario Amodei. Major investors in the company include Google, Amazon and FTX, the failed cryptocurrency exchange founded by Sam Bankman-Fried. In addition to publishing research on the interpretability of machine learning systems, Anthropic has developed several products.

One such product, Constitutional AI, is a set of techniques designed to align AI systems with human values, and to render AI systems harmless and honest. Constitutional AI is designed to train an AI assistant to be helpful and safe without human supervision. Anthropic has also released several versions of its flagship product, an AI driven generative LLM system known as Claude. Like all generative LLMs, Claude generates data outputs in response to queries from users.

It should be noted that Anthropic PBC is a public benefit corporation, or PBC (also known as a “B-corporation”). There is a significant difference between a PBC and a not-for-profit corporation, in that a PBC is expected to generate profits for its shareholders and investors. There is also a significant difference between a PBC and corporation, in that a PBC is expected to attempt to make a positive impact on society, while a corporation has no such obligation. There are currently thirty-six states in the U.S. which recognize public benefit corporations. The stated public benefit goals of Anthropic PBC are to build AI systems that people can rely on and to generate research about the opportunities and risks inherent in the use of AI systems.

2.    How do LLM systems work?

The truth is, humans are still struggling to understand and explain exactly how computers sift through data, develop the ability to mimic human communication and generate other creative works when artificial intelligence techniques are used. People often refer to this process as “training” an AI system; really, this is a misnomer, since training implies the presence of a trainer.  Simply put, AI system developers give LLMs access to huge pools of data, develop algorithms that govern how those LLMs sort through that data, and then refine the algorithms to narrow the parameters of acceptable responses generated by the LLM until the LLM is generating output that meets the needs of the developers. (Major commercial LLM systems do generally involve a lot of manual feedback and fine-tuning from large teams of human contributors – but arguably, this feedback is only secondary).

When a user inputs a query into an LLM, the LLM will search through the data available to the system and then generate responses to that query in accordance with those governing algorithms. The larger the amount of data available to the LLM, the more comprehensive and accurate its responses are able to be.

Because of the way that they function, LLM systems make copies of the datasets that serve as the source of the responses the systems produce. Part of this is a function of how computers work, and there is legal precedent to support the proposition that this type of copying is legal. However, allegations have been made that there are times when the data included in the materials used to train LLMs, and which are available to LLMs in generating responses to queries, is protected by copyright. There are also allegations that, at times, the output generated by LLM systems includes copies of material protected by copyright, and often presents those copies as being created by the AI system itself as opposed to being created by the author of the underlying data.

3.    What is Claude?

“Claude” is the trade name applied to a series of LLM AI products produced by Anthropic which can recognize text, computer code and images. Initially, Claude was available only to select users. However, with the release of Claude 2 in July of 2023, the AI system became available to the general public. Claude 3 followed shortly thereafter, and is available in three iterations – Opus, Sonnet and Haiku. These three versions of Claude 3 differ in their intended uses. Opus is designed for strong performance on highly complex tasks; Sonnet is intended for less complex tasks that can be completed more quickly; and Haiku is designed to mimic human conversation.  Currently, only Sonnet is publicly available in a version 3.5.

Anthropic promotes Claude as being able to create and draft books, plays, text messages, emails and virtually any form of written communication. Claude can also be used to generate custom tools that generate content in response to input from users. Anthropic makes Claude available on a monthly subscription basis; a limited version is also available for free.

Claude is at the center of the two lawsuits facing Anthropic. More specifically, at issue are the sets of data used to train the LLM and used by the LLM to generate responses to queries. In both cases, plaintiffs are alleging that data protected by copyright has been included in Claude’s dataset, in violation of the copyright holders’ rights. They are also alleging that the output generated by Claude violates their rights. In one case, the plaintiffs allege that Claude is actually competing with creators in the marketplace.

4.    Concord Music Group, et al v. Anthropic PBC.

In the first lawsuit brought against Anthropic, several music publishers are alleging that Claude is violating the copyright in song lyrics those publishing companies own or control.  The publishers claim that the copyright violations are  widespread and systemic in the entire series of Claude LLM systems, both in the form of inputs into datasets being used in the training of the LLMs and in the output generated by the systems in response to queries from users. While other website and music lyric aggregators obtain licenses to access song lyrics, the music publishers suing Anthropic point out that Anthropic has not obtained any such licenses granting Anthropic the right to incorporate copyright protected lyrics into Claude datasets. The music publishers go on to allege that Claude sometimes generates infringing lyrics in response to queries that don’t actually request lyrics.

The lawsuit includes four specific claims. First, the music publishers allege that Anthropic is directly violating the copyrights in song lyrics by including protected song lyrics in the datasets for Claude. In essence, the music publishers are alleging the simple act of including the song lyrics in the datasets that power Claude is, in and of itself, a violation of copyright.

Second, the publishers allege both contributory and vicarious liability on the part of Anthropic, in that users of Claude are violating copyrights in song lyrics when they take the output generated by Claude and incorporate that output into their own works. In doing so, those users are not acknowledging the actual authorship of the work product they are presenting as their own.

Finally, the music publishers allege that Anthropic violates the Digital Millennium Copyright Act (DMCA) when it circumvents data management systems used to protect song lyrics from copying and removes indicia of ownership of those song lyrics when compiling its datasets. The DMCA includes express prohibitions against circumventing copy-prevention systems, or technical protection measures, used by copyright owners to protect their works from copying.

The complaint includes several detailed comparisons of lyrics generated by Claude to copyright protected lyrics, pointing out the similarities. The music publishers go on to allege that Anthropic, allegedly worth some $5 billion, is profiting greatly off of its infringing activities.

There has been a fair amount of logistical wrangling in the case. Anthropic was successful in moving the case from Tennessee to California and has moved to dismiss all but the copyright infringement claim. Anthropic has stated that it will address the copyright infringement claim in due course.

5.    Bartz v. Anthropic PBC.

In another litigation, a group of journalists and authors have filed a proposed class action against Anthropic in the Federal court in the Northern District of California.  Like the lawsuit brought by music publishers, the suit brought by writers alleges copyright infringement, and that Anthropic has made no effort to compensate writers for the inclusion of their works in the datasets used both to train Claude and appearing in the output generated by Claude.

However, unlike the suit brought by music publishers, the writers go on to allege that Anthropic is competing with the writers by generating content writers would otherwise be paid to create, thus diluting the market in which writers make their living. Such a claim, if proven, will make it harder for Anthropic to successfully mount a fair use defense. The writers also allege that Anthropic knowingly included pirated versions of their copyright protected works into the datasets for Claude. In another difference from the suit brought by music publishers, the proposed class action lawsuit brought against Anthropic by the writers includes a single cause of action – copyright infringement.

For its part, Anthropic intends to mount a multi-pronged defense against the charges brought by the writers. Anthropic intends to claim, among other things, that its copying constitutes fair use, is innocent and is done on the basis of an express or implied license.  In addition, Anthropic intends to challenge the standing of at least one of the plaintiffs to bring the suit, and to assert that some of the works in question are in the public domain and thus not protected by copyright.

6.    What Is At Stake?

On one level, these cases are all about money, and a lot of it. Anthropic has received billions of dollars in investments and is projected to generate more than $850 million in annual revenue by the end of 2024. These cases primarily seek to force Anthropic to give creators their fair share of the revenues generated by the use of systems developed using their proprietary works.

On another level, these cases seek to put protections in place with respect to the way AI is used in the marketplace. To a great extent, then, these cases are about more than just the disappearance of customer service representatives and the decline of human interaction in the way we do business. Notably, neither of the cases suggests that LLMs should not exist. Rather, the cases seek to protect the work product of the people whose creations are powering AI systems, and to protect the marketplace for human creativity from competition presented by those systems.

7.    How IBL can help you

Whether you are a creator looking to protect your work from being included in LLM datasets without your consent, or you are a company looking to incorporate generative LLM systems into your business while minimizing legal risks, the lawyers at IBL partners are here to help. Please reach out to schedule a consultation today.

Leave a Reply

Your email address will not be published. Required fields are marked *