GitHub, Microsoft, and OpenAI Sued Over AI Tool’s ‘Unprecedented Scale’ of Software Piracy


A civil suit filed in the Northern District of California by two anonymous software developers on behalf of a class claims that GitHub, Microsoft, and OpenAI and its corporate relatives’ artificial intelligence (AI) coding product CoPilot has stripped the plaintiffs of their intellectual property rights related to code licensed by and stored with GitHub.

The filing explains that GitHub, an internet hosting service for open source software, allows developers to publish licensed materials pursuant to written agreements. Microsoft bought the company in 2018 for $7.5 billion. 

The filing explains that defendant OpenAI came into the picture in June 2021. It self-describes as “non-profit artificial intelligence research company,” and was founded by AI researchers, Elon Musk, and Sam Altman, the president of Y Combinator, a tech-startup incubator, the complaint says.

That summer, Github and OpenAI launched CoPilot, a subscription service for $10 monthly or $100 yearly run on Microsoft’s Azure cloud-computing platform. The complaint describes CoPilot as “an AI-based product that promises to assist software coders by providing or filing in blocks of code using AI.” Then, in August 2021, OpenAI debuted Codex, a supporting product that converts natural language into code and was integrated into CoPilot.

The filing claims that the defendants “have been cagey about what data was used to train the AI,” though they reportedly admitted that training data included vast numbers of publicly accessible repositories on GitHub, which include and are limited by licenses. Problematically, the defendants used Copilot to distribute the now-anonymized code to Copilot users as if it were created by Copilot, the filing says, thereby stripping developers of attribution, copyright notice, and license terms from their code in violation of the licenses.

The proposed class covers all persons or entities domiciled in the United States that owned one or more copyrights in any work, offered that work under one of GitHub’s suggested licenses, and stored licensed material in a Github repository between Jan. 1, 2015 and the present.

The suit states claims under the Digital Millennium Copyright Act, the Lanham Act, and under the business practice, consumer protection, and privacy laws of California. It also brings tort claims related to the defendants’ allegedly poor data handling and seeks an order remediating the copyright infringement and damages.

The plaintiffs are represented by The Joseph Saveri Law Firm LLP and Matthew Butterick.