New study shows why simulated reasoning AI models don’t yet live up to their billing

AI models designed to "reason" have shown impressive accuracy in solving routine math problems, yet they often struggle with creating detailed mathematical proofs, even at a competition level. A recent study led by researchers from ETH Zurich and Sofia University revealed that these simulated reasoning (SR) models typically scored below 5% on average when generating complete proofs, despite demonstrating better performance on answer-focused tasks. This highlights a significant gap in their ability to reason abstractly compared to human mathematicians.

Summary