CoT-Valve: Length-Compressible Chain-of-Thought Tuning
Xinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang

TL;DR
CoT-Valve introduces a method to dynamically control and compress reasoning chain lengths in language models, reducing inference costs while maintaining high accuracy across tasks.
Contribution
The paper presents a novel approach to elastically control and compress reasoning chains in models using parameter space manipulation, improving efficiency over prompt-based methods.
Findings
Successfully reduces reasoning chain length on GSM8K from 741 to 225 tokens.
Maintains high accuracy with minimal performance drop during compression.
Demonstrates better controllability and compressibility than prompt-based control methods.
Abstract
Chain-of-Thought significantly enhances a model's reasoning capability, but it also comes with a considerable increase in inference costs due to long chains. With the observation that the reasoning path can be easily compressed under easy tasks but struggle on hard tasks, we explore the feasibility of elastically controlling the length of reasoning paths with only one model, thereby reducing the inference overhead of reasoning models dynamically based on task difficulty. We introduce a new tuning and inference strategy named CoT-Valve, designed to allow models to generate reasoning chains of varying lengths. To achieve this, we propose to identify a direction in the parameter space that, when manipulated, can effectively control the length of generated CoT. Moreover, we show that this property is valuable for compressing the reasoning chain. We construct datasets with chains from long…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsCatalysis and Oxidation Reactions
