Do LLMs Encode Functional Importance of Reasoning Tokens?
Janvijay Singh, Dilek Hakkani-T\"ur

TL;DR
This paper introduces a diagnostic method called greedy pruning to analyze whether large language models encode the functional importance of reasoning tokens, improving understanding of internal reasoning structures.
Contribution
It proposes a likelihood-preserving pruning technique to identify important reasoning tokens and demonstrates that models encode nontrivial functional importance structures.
Findings
Pruned reasoning chains can outperform baseline compression methods.
Attention scores can predict token importance in reasoning chains.
Models encode a nontrivial functional importance structure over reasoning tokens.
Abstract
Large language models solve complex tasks by generating long reasoning chains, achieving higher accuracy at the cost of increased computational cost and reduced ability to isolate functionally relevant reasoning. Prior work on compact reasoning shortens such chains through probabilistic sampling, heuristics, or supervision from frontier models, but offers limited insight into whether models internally encode token-level functional importance for answer generation. We address this gap diagnostically and propose greedy pruning, a likelihood-preserving deletion procedure that iteratively removes reasoning tokens whose removal minimally degrades model likelihood under a specified objective, yielding length-controlled reasoning chains. We evaluate pruned reasoning in a distillation framework and show that students trained on pruned chains outperform a frontier-model-supervised compression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
