Pruning the Unsurprising: Efficient LLM Reasoning via First-Token Surprisal

Wenhao Zeng; Yaoning Wang; Chao Hu; Yuling Shi; Chengcheng Wan; Hongyu Zhang; Xiaodong Gu

arXiv:2508.05988·cs.LG·January 9, 2026

Pruning the Unsurprising: Efficient LLM Reasoning via First-Token Surprisal

Wenhao Zeng, Yaoning Wang, Chao Hu, Yuling Shi, Chengcheng Wan, Hongyu Zhang, Xiaodong Gu

PDF

Open Access 2 Models

TL;DR

This paper introduces ASAP, a novel framework for compressing chain-of-thought reasoning in large language models by pruning less critical steps using a first-token surprisal metric, leading to efficient reasoning with maintained accuracy.

Contribution

ASAP combines anchor-guided pruning with a new first-token surprisal metric for logical step selection, improving reasoning efficiency without sacrificing accuracy.

Findings

01

Achieves state-of-the-art accuracy on multiple benchmarks.

02

Reduces training and inference costs significantly.

03

Maintains reasoning quality with substantial compression.

Abstract

Large Reasoning Models (LRMs) have demonstrated remarkable capabilities by scaling up the length of Chain-of-Thought (CoT). However, excessively long reasoning traces pose substantial challenges for training cost and inference latency. While various CoT compression approaches have emerged to address this challenge, they face inherent trade-offs: token-level methods often disrupt syntactic and logical coherence, while step-level methods based on perplexity fail to reliably capture the logically critical reasoning steps because of the dilution of logical information. In this paper, we propose ASAP (Anchor-guided, SurprisAl-based Pruning), a novel coarse-to-fine framework for CoT compression. ASAP first performs anchor-guided pruning to preserve the core reasoning structure, which efficiently reduces the search space for subsequent processing. Leveraging the insight that logical branching…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research