Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Yue Wang; Qiuzhi Liu; Jiahao Xu; Tian Liang; Xingyu Chen; Zhiwei He,; Linfeng Song; Dian Yu; Juntao Li; Zhuosheng Zhang; Rui Wang; Zhaopeng Tu,; Haitao Mi; and Dong Yu

arXiv:2501.18585·cs.CL·February 19, 2025

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Yue Wang, Qiuzhi Liu, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He,, Linfeng Song, Dian Yu, Juntao Li, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu,, Haitao Mi, and Dong Yu

PDF

Open Access

TL;DR

This paper identifies the underthinking phenomenon in o1-like large language models, where frequent thought switching hampers reasoning depth and accuracy, and proposes a decoding strategy to mitigate this issue, improving performance without fine-tuning.

Contribution

It introduces a novel metric to quantify underthinking and proposes TIP, a decoding method that reduces premature thought switching to enhance reasoning depth in LLMs.

Findings

01

Thought switching correlates with incorrect answers.

02

TIP improves accuracy on challenging datasets.

03

The approach does not require model fine-tuning.

Abstract

Large language models (LLMs) such as OpenAI's o1 have demonstrated remarkable abilities in complex reasoning tasks by scaling test-time compute and exhibiting human-like deep thinking. However, we identify a phenomenon we term underthinking, where o1-like LLMs frequently switch between different reasoning thoughts without sufficiently exploring promising paths to reach a correct solution. This behavior leads to inadequate depth of reasoning and decreased performance, particularly on challenging mathematical problems. To systematically analyze this issue, we conduct experiments on three challenging test sets and two representative open-source o1-like models, revealing that frequent thought switching correlates with incorrect responses. We introduce a novel metric to quantify underthinking by measuring token efficiency in incorrect answers. To address underthinking, we propose a decoding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Digital Rights Management and Security