New study accuses LM Arena of gaming its popular AI benchmark

Traditional academic benchmarks only tell you so much, which has led many to lean on vibes-based analysis from LM Arena. LM Arena was created in 2023 as a research project at the University of California, Berkeley. This data is aggregated in the LM Arena leaderboard that shows which models people like the most, which can help track improvements in AI models. Google noted when it released Gemini 2.5 Pro that the model debuted at the top of the LM Arena leaderboard, where it remains to this day. The researchers, hailing from Cohere Labs, Princeton, and MIT, believe AI developers may have placed too much stock in LM Arena.

Summary