Study accuses LM Arena of helping top AI labs game its benchmark

A study suggests that LM Arena, a platform for AI benchmarking, may have unfairly granted some companies, including Meta and Google, more private testing opportunities without revealing lower-performing models, potentially skewing their scores in the leaderboard. The platform defends its process, stating it has been transparent about pre-release testing since March 2024 and is open to measures such as setting test limits or adjusting sampling rates. Meanwhile, the findings highlight ongoing scrutiny of LM Arena's fairness in benchmarking amid recent controversies over gaming the system by Meta with an optimized model.

Summary