Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning
Jiwon Song, Dongwon Jo, Yulhwa Kim, Jae-Joon Kim

TL;DR
Reasoning Path Compression (RPC) is a training-free method that enhances the efficiency of large language models by compressing reasoning paths based on semantic importance, significantly improving throughput with minimal accuracy loss.
Contribution
RPC introduces a novel, training-free approach to compress reasoning paths in LLMs, leveraging semantic sparsity to improve inference speed without retraining.
Findings
RPC achieves up to 1.60× throughput improvement.
RPC causes only 1.2% accuracy drop on AIME 2024.
Semantic sparsity effectively enables reasoning path compression.
Abstract
Recent reasoning-focused language models achieve high accuracy by generating lengthy intermediate reasoning paths before producing final answers. While this approach is effective in solving problems that require logical thinking, long reasoning paths significantly increase memory usage and reduce throughput of token generation, limiting the practical deployment of such models. We propose Reasoning Path Compression (RPC), a training-free method that accelerates inference by leveraging the semantic sparsity of reasoning paths. RPC periodically compresses the KV cache by retaining cache entries that receive high importance score, which are computed using a selector window composed of recently generated queries. Experiments show that RPC improves generation throughput of QwQ-32B by up to 1.60 compared to the inference with full KV cache, with an accuracy drop of 1.2\% on the AIME…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Formal Methods in Verification
