OpenAI’s o3 AI model scores lower on a benchmark than the company initially implied

techcrunch.comPublished: 4/20/2025

Summary

OpenAI’s o3 AI model faced scrutiny after a significant gap emerged between its claimed performance on FrontierMath and independent tests by Epoch AI. While OpenAI internally achieved over 25% using advanced compute settings, the publicly released model scored around 10% in Epoch’s evaluation, likely due to differences in testing setups and model configurations. This discrepancy highlights broader concerns about benchmark transparency in the AI industry, as companies often optimize models for real-world use rather than peak performance. OpenAI emphasized that o3 is designed for efficiency and speed, with plans to release a more powerful variant soon.