Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression
Lingjie Zeng, Xiaofan Chen, Yanbo Wang, Xiuying Chen

TL;DR
This study empirically examines how compressing chain-of-thought reasoning traces impacts model trustworthiness, revealing trade-offs and proposing a method to balance efficiency with trustworthiness.
Contribution
First systematic analysis of trustworthiness effects of CoT compression, introducing a normalized efficiency score and an alignment-aware DPO variant.
Findings
CoT compression often causes trustworthiness regressions.
Different compression methods have distinct degradation profiles.
Proposed DPO variant reduces length by 19.3% with less trustworthiness loss.
Abstract
Long chain-of-thought (Long-CoT) reasoning models have motivated a growing body of work on compressing reasoning traces to reduce inference cost, yet existing evaluations focus almost exclusively on task accuracy and token savings. Trustworthiness properties, whether acquired or reinforced through post-training, are encoded in the same parameter space that compression modifies. This means preserving accuracy does not, a priori, guarantee preserving trustworthiness. We conduct the first systematic empirical study of how CoT compression affects model trustworthiness, evaluating multiple models of different scales along three dimensions: safety, hallucination resistance, and multilingual robustness. Under controlled comparisons, we find that CoT compression frequently introduces trustworthiness regressions and that different methods exhibit markedly different degradation profiles across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
