OpenAI Addresses NY Times Copyright Lawsuit, Calls Claims ‘Baseless’

Stay updated with our daily and weekly newsletters for the latest in industry-leading AI developments and exclusive content.

After the shocking news at the end of last year that The New York Times (NYT), one of the most renowned newspapers globally, was suing OpenAI and its supporter Microsoft for copyright infringement, OpenAI has now responded publicly. In a blog post, OpenAI described the lawsuit as “without merit.” OpenAI says they support journalism and partner with news organizations, arguing that the NYT’s lawsuit has no basis.

OpenAI’s response includes three main points:
1. They work together with news organizations to create new opportunities.
2. They believe their training practices fall under fair use but offer an opt-out option because they think it’s the right thing to do.
3. Issues with models generating highly similar content to specific articles (referred to as “regurgitation”) are rare bugs they are actively working to eliminate.

A significant focus of the post is OpenAI’s recent content licensing deals with rival news organizations, such as Axel Springer (publisher of Politico and Business Insider), and the Associated Press (AP). This is balanced against their previous stance that they can legally use any public web data for training their AI models, including GPT-3.5 and GPT-4, which power ChatGPT.

Since their DevDay conference in November 2023, OpenAI has been offering legal protections for users of their AI products.

Background:
The NYT filed its lawsuit in late December 2023 in New York Southern District Court, claiming that OpenAI trained its models on copyrighted articles without permission or compensation. They provided instances of ChatGPT producing text nearly identical to NYT articles, arguing that this constitutes copyright infringement.

The lawsuit followed several months of failed negotiations for a content licensing deal between OpenAI and NYT. OpenAI asserts that using publicly available materials is fair use, backed by long-standing precedents. They mention providing a simple opt-out process for publishers, which NYT adopted in August 2023, allowing them to prevent OpenAI’s tools from accessing their sites. However, this opt-out option was only made available after ChatGPT launched in November 2022, giving publishers little chance to block data scraping before then. The implication is that OpenAI’s licensing deals with other publishers might circumvent those opting out, blocking OpenAI from using their material for training.

OpenAI also accuses the NYT of intentionally manipulating prompts to gather evidence for their lawsuit, violating OpenAI’s Terms of Service. They argue that the NYT induced instances of article “regurgitation,” which are not representative of typical or allowed user activity. OpenAI is continuously working to make their systems more resistant to such adversarial attacks.

In response, Trident DMG, representing the NYT, provided a statement from an NYT lawyer, emphasizing that OpenAI admitted to using the NYT’s work to build ChatGPT and that this usage isn’t considered fair use by any measure.

The case will be heard by Federal District Court Judge Sidney H. Stein, though no initial hearing date has been scheduled. The blog post hasn’t yet been entered as an argument or evidence but might appear in the docket calling for a dismissal.

With more AI services reproducing copyrighted content, 2024 will likely be a pivotal year for the legality of AI training data sources.

Related Posts

DataStax Simplifies Development of Generative AI RAG Applications with Advanced Data API

Financial Institutions Harness AI to Streamline KYC Processes, Boosting Compliance and Enhancing Customer Experience

The Appeal of Misconfigured Clouds and Phones to Cyber Attackers

Can OpenAI’s Ambitious Effort to Embed Democracy in AI Outshine PR Hype? | The AI Beat