New study shows why simulated reasoning AI models don’t yet live up to their billing
arstechnica.comPublished: 4/25/2025
Summary
AI models designed to "reason" have shown impressive accuracy in solving routine math problems, yet they often struggle with creating detailed mathematical proofs, even at a competition level. A recent study led by researchers from ETH Zurich and Sofia University revealed that these simulated reasoning (SR) models typically scored below 5% on average when generating complete proofs, despite demonstrating better performance on answer-focused tasks. This highlights a significant gap in their ability to reason abstractly compared to human mathematicians.