Pruning the Unsurprising: Efficient LLM Reasoning via First-Token Surprisal
Wenhao Zeng, Yaoning Wang, Chao Hu, Yuling Shi, Chengcheng Wan, Hongyu Zhang, Xiaodong Gu

TL;DR
This paper introduces ASAP, a novel framework for compressing chain-of-thought reasoning in large language models by pruning less critical steps using a first-token surprisal metric, leading to efficient reasoning with maintained accuracy.
Contribution
ASAP combines anchor-guided pruning with a new first-token surprisal metric for logical step selection, improving reasoning efficiency without sacrificing accuracy.
Findings
Achieves state-of-the-art accuracy on multiple benchmarks.
Reduces training and inference costs significantly.
Maintains reasoning quality with substantial compression.
Abstract
Large Reasoning Models (LRMs) have demonstrated remarkable capabilities by scaling up the length of Chain-of-Thought (CoT). However, excessively long reasoning traces pose substantial challenges for training cost and inference latency. While various CoT compression approaches have emerged to address this challenge, they face inherent trade-offs: token-level methods often disrupt syntactic and logical coherence, while step-level methods based on perplexity fail to reliably capture the logically critical reasoning steps because of the dilution of logical information. In this paper, we propose ASAP (Anchor-guided, SurprisAl-based Pruning), a novel coarse-to-fine framework for CoT compression. ASAP first performs anchor-guided pruning to preserve the core reasoning structure, which efficiently reduces the search space for subsequent processing. Leveraging the insight that logical branching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research
