TokenSkip: Controllable Chain-of-Thought Compression in LLMs

Heming Xia; Chak Tou Leong; Wenjie Wang; Yongqi Li; Wenjie Li

arXiv:2502.12067·cs.CL·September 17, 2025

TokenSkip: Controllable Chain-of-Thought Compression in LLMs

Heming Xia, Chak Tou Leong, Wenjie Wang, Yongqi Li, Wenjie Li

PDF

Open Access 1 Repo

TL;DR

TokenSkip is a method that selectively skips less important tokens in chain-of-thought outputs of large language models, reducing inference latency and token usage while maintaining reasoning accuracy.

Contribution

It introduces TokenSkip, a novel approach for controllable compression of chain-of-thought sequences in LLMs based on token importance analysis.

Findings

01

Reduces reasoning tokens by 40% on GSM8K.

02

Maintains reasoning performance with less than 0.4% accuracy drop.

03

Effective across various models and tasks.

Abstract

Chain-of-Thought (CoT) has been proven effective in enhancing the reasoning capabilities of large language models (LLMs). Recent advancements, such as OpenAI's o1 and DeepSeek-R1, suggest that scaling up the length of CoT sequences during inference could further boost LLM reasoning performance. However, due to the autoregressive nature of LLM decoding, longer CoT outputs lead to a linear increase in inference latency, adversely affecting user experience, particularly when the CoT exceeds 10,000 tokens. To address this limitation, we analyze the semantic importance of tokens within CoT outputs and reveal that their contributions to reasoning vary. Building on this insight, we propose TokenSkip, a simple yet effective approach that enables LLMs to selectively skip less important tokens, allowing for controllable CoT compression. Extensive experiments across various models and tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hemingkx/tokenskip
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Advanced Data Storage Technologies · Innovative Microfluidic and Catalytic Techniques Innovation