ExpThink: Experience-Guided Reinforcement Learning for Adaptive Chain-of-Thought Compression

Tingcheng Bian; Yuzhe Zhang; Jing Jin; Jinchang Luo; MingQuan Cheng; Haiwei Wang; Wenyuan Jiang; Miaohui Wang

arXiv:2605.07501·cs.LG·May 19, 2026

ExpThink: Experience-Guided Reinforcement Learning for Adaptive Chain-of-Thought Compression

Tingcheng Bian, Yuzhe Zhang, Jing Jin, Jinchang Luo, MingQuan Cheng, Haiwei Wang, Wenyuan Jiang, Miaohui Wang

PDF

TL;DR

ExpThink introduces an experience-guided reinforcement learning framework that dynamically balances reasoning accuracy and token efficiency, significantly reducing response length while improving performance on mathematical reasoning tasks.

Contribution

It proposes a novel RL approach with experience-guided reward shaping and difficulty-adaptive advantage to enhance CoT compression without manual tuning.

Findings

01

Reduces average response length by up to 77%.

02

Achieves up to 3x higher accuracy-efficiency ratio.

03

Outperforms existing RL-based compression methods.

Abstract

Large reasoning models (LRMs) achieve strong performance via extended chain-of-thought (CoT) reasoning, yet suffer from excessive token consumption and high inference latency. Existing reinforcement learning (RL) approaches for CoT compression rely on uniform, static length penalties that neglect model capability dynamics and problem-level difficulty variation. We propose \textbf{ExpThink}\xspace, an RL framework that addresses both dimensions through two complementary mechanisms. First, \emph{experience-guided reward shaping} tracks the shortest correct solution found so far for each problem and applies a three-tier reward: full credit for concise correct responses, discounted credit for verbose correct ones, and zero for incorrect ones. The threshold tightens automatically with model improvement, forming a self-evolving curriculum that requires no manual scheduling. Second,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.