Meta’s Llama faces scrutiny as OSI defines ‘Open’ AI standards

With the announcement of the Open Source Initiative’s (OSI) formal definition of “open” AI on October 28, 2024, the stage is set for a confrontation with IT behemoths like Meta, whose models don’t comply with the regulations.

The scope of OSI’s open-source standards

Although OSI has long established the industry standard for what qualifies as open-source software, AI systems contain components, such as model training data, that aren’t covered by traditional licenses. An AI system must now offer the following to be deemed fully open source:

Information about the data used to train the AI is made available so that others may comprehend and replicate it. The whole code was utilised to create and execute the AI. The training weights and variables assist the AI in generating its output.

No Content Available

Meta’s Llama model and OSI compliance challenges

Meta’s Llama, frequently marketed as the biggest open-source AI model, is directly challenged by this definition. Although Llama may be downloaded and used by the general public, it does not meet OSI’s requirements for unfettered freedom to use, change, and distribute since it does not grant access to training data and imposes limitations on commercial usage (for applications with more than 700 million users).

Meta’s response to OSI’s definition

Regardless of technical definitions, Eischen continued, “We will keep working with OSI and other industry groups to make AI more accessible and free responsibly.”

Developers who wish to build upon one another’s work without worrying about legal action or licensing pitfalls have generally embraced OSI’s definition of open-source software for the past 25 years. As AI changes the landscape, tech companies now have a crucial decision to make: accept or reject these tried-and-true ideas. Additionally, the Linux Foundation has attempted to define “open-source AI,” indicating a rising discussion on how traditional open-source norms would change in the age of artificial intelligence.

Reactions from the open-source community

Simon Willison, an independent researcher and the creator of the open-source multi-tool Datasette, reported that now that we have a strong definition, “perhaps we can push back more aggressively against companies that are ‘open washing’ and declaring their work open source when it actually isn’t.”

OSI’s definition was deemed “a huge help in shaping the conversation around openness in AI, especially when it comes to the crucial role of training data,” said Hugging Face CEO Clément Delangue.

Defining “Open” AI: A collaborative effort

According to Stefano Maffulli, executive director of OSI, the endeavour required two years to define this concept through a collaborative approach that involved consulting specialists worldwide. This required collaborating with philosophers, content producers from the Creative Commons community, machine learning and natural language processing specialists from academia, and others.

Legal and competitive concerns for Tech Giants

Critics perceive a more straightforward motivation: reducing its legal responsibility and protecting its competitive edge, even if Meta cites safety concerns as justification for limiting access to its training data. It’s pretty likely that many AI models are trained on copyrighted information; according to a story in the New York Times in April, Meta admitted internally that it had copyrighted stuff in its training data “because we have no way of not collecting that.” Numerous infringement lawsuits have been filed against Meta, OpenAI, Perplexity, and Anthropic. But at the moment, claimants must depend on circumstantial evidence to show that their work has been scraped, with very few exceptions (such as Stable Diffusion, which makes its training data public).

Echoes of open-source history

Maffulli observes that the history of open-source is being repeated. Maffulli says that “Meta is making the same arguments” as Microsoft did in the 1990s when it believed that open source posed a danger to its business strategy. He remembers that Meta asked him, “Who do you think is going to be able to do the same thing?” after informing him of its significant investment in Llama. Maffulli saw a recurring trend: a tech behemoth citing complexity and expense as justifications for limiting access to its technologies. He remarked, “We return to the early days.”

Regarding the training data, Maffulli remarked, “That’s their secret sauce.” “The valuable IP is what it is.”

Tags: Meta’s Llama