TL;DR
Think-Anywhere introduces a flexible reasoning mechanism for LLMs that allows on-demand thinking during code generation, improving performance and interpretability across multiple benchmarks.
Contribution
It proposes a novel on-demand reasoning method for LLMs in code generation, combining cold-start training and outcome-based RL rewards.
Findings
Achieves state-of-the-art results on four code benchmarks.
Enables adaptive reasoning invocation at high-entropy positions.
Demonstrates consistent generalization across diverse LLMs.
Abstract
Recent advances in reasoning Large Language Models (LLMs) have primarily relied on upfront thinking, where reasoning occurs before final answer. However, this approach suffers from critical limitations in code generation, where upfront thinking is often insufficient as problems' full complexity only reveals itself during code implementation. Moreover, it cannot adaptively allocate reasoning effort throughout the code generation process where difficulty varies significantly. In this paper, we propose Think-Anywhere, a novel reasoning mechanism that enables LLMs to invoke thinking on-demand at any token position during code generation. We achieve Think-Anywhere by first teaching LLMs to imitate the reasoning patterns through cold-start training, then leveraging outcome-based RL rewards to drive the model's autonomous exploration of when and where to invoke reasoning. Extensive experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
