TL;DR
This paper introduces Think-at-Hard, a selective iteration method for reasoning LLMs that improves accuracy by skipping unnecessary latent refinements, leveraging a lightweight decider and depth-aware LoRA modules.
Contribution
It proposes a novel selective iteration approach with a neural decider and depth-aware LoRA, significantly enhancing reasoning performance while reducing unnecessary computations.
Findings
TaH outperforms always-iterate baselines by 3.8-4.4% across nine benchmarks.
Skipping 93% of tokens' iterations yields substantial accuracy gains.
Additional parameters from LoRA and decider modules further improve performance by 5.3-6.8%.
Abstract
Improving reasoning abilities of Large Language Models (LLMs), especially under parameter constraints, is crucial for real-world applications. Looped transformers address this by performing multiple latent iterations to refine each token beyond a single forward pass. However, we identify a latent overthinking phenomenon: most token predictions are already correct after the first pass, but are sometimes revised into errors in later iterations. In this work, we ask whether selectively skipping latent iterations may improve accuracy. We reveal significant potential with an oracle iteration policy that boosts model performance by up to 7.3%. Motivated by this, we propose Think-at-Hard (TaH), a looped transformer optimized for selective iteration. TaH employs a lightweight neural decider to trigger latent iteration only at tokens that are likely incorrect after the standard forward pass.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
