The Relationship Between Reasoning and Performance in Large Language   Models -- o3 (mini) Thinks Harder, Not Longer

Marthe Ballon; Andres Algaba; Vincent Ginis

arXiv:2502.15631·cs.LG·February 24, 2025

The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer

Marthe Ballon, Andres Algaba, Vincent Ginis

PDF

1 Repo

TL;DR

This paper investigates how reasoning chain length affects accuracy in large language models, revealing that newer models improve performance not by longer reasoning but through more efficient reasoning, with implications for model evaluation and scaling.

Contribution

The study systematically analyzes reasoning length and accuracy in different model generations, showing that performance gains are due to more effective reasoning rather than longer chains.

Findings

01

o3-mini outperforms o1-mini without longer reasoning chains

02

Accuracy declines as reasoning chains grow, even when controlling for difficulty

03

More proficient models use test-time compute more efficiently

Abstract

Large language models have demonstrated remarkable progress in mathematical reasoning, leveraging chain-of-thought and test-time compute scaling. However, many open questions remain regarding the interplay between reasoning token usage and accuracy gains. In particular, when comparing models across generations, it is unclear whether improved performance results from longer reasoning chains or more efficient reasoning. We systematically analyze chain-of-thought length across o1-mini and o3-mini variants on the Omni-MATH benchmark, finding that o3-mini (m) achieves superior accuracy without requiring longer reasoning chains than o1-mini. Moreover, we show that accuracy generally declines as reasoning chains grow across all models and compute settings, even when controlling for difficulty of the questions. This accuracy drop is significantly smaller in more proficient models, suggesting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MartheBallon/analysis_o3-mini_thinks_harder_not_longer
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.