OpenAI raised a tempest when it announced that GPT-5 had resolved ten Erdős mathematical problems previously unsolved and had advanced eleven others.
The assertion, which was made by Kevin Weil, the vice president of OpenAI, in a tweet that has since been deleted, was swiftly disproven by the AI and math communities. The uproar underscores the significant risks associated with AI’s pursuit of genuine mathematical reasoning.
OpenAI’s Erdős problems claims under fire
The claim centred on GPT-5’s supposed breakthrough with Erdős problems—famous tough questions in mathematics. However, mathematician Thomas Bloom clarified that these problems weren’t “unsolved” in the established sense.
Bloom noted that his website’s “open” label means he is unaware of a solution, not that the problem lacks resolution in the broader community. OpenAI’s GPT-5 did not derive new proofs but retrieved known solutions already in the literature.
As Meta’s Chief AI Scientist, Yann LeCun, commented, OpenAI was “hoisted by their own GPTards,” while Google DeepMind’s Demis Hassabis called the episode “embarrassing”.
Why retrieval differs from discovery
The episode highlights a crucial gap between retrieving known solutions and original mathematical reasoning. OpenAI researcher Sébastien Bubeck admitted the model found solutions from vast, fragmented math research, but emphasised that finding existing papers is still challenging.
Nevertheless, competitors contend that the credibility of AI’s endeavour to resolve intricate issues is compromised by the erroneous identification of retrieval as discovery. This backlash emphasises the delicate balance between the capabilities of AI tools and genuine innovation.
Industry reaction and lessons learned
The backlash was swift and sharp. Seeing OpenAI’s overreach as a self-inflicted wound, Rival AI labs warn against exaggerating AI’s current reasoning abilities. The episode highlights the importance of verifying AI breakthrough claims, especially in domains requiring significant expertise and uniqueness.
This episode encourages AI developers and users to distinguish between enhanced search and problem-solving breakthroughs.
This controversy is a potent example of AI’s challenges in convincingly achieving and communicating real mathematical innovation.