Think Dense, Not Long: Dynamic Decoupled Conditional Advantage for Efficient Reasoning

Keqin Peng; Yuanxin Ouyang; Xuebo Liu; Zhiliang Tian; Ruijian Han; Yancheng Yuan; Liang Ding

arXiv:2602.02099·cs.CL·February 3, 2026

Think Dense, Not Long: Dynamic Decoupled Conditional Advantage for Efficient Reasoning

Keqin Peng, Yuanxin Ouyang, Xuebo Liu, Zhiliang Tian, Ruijian Han, Yancheng Yuan, Liang Ding

PDF

Open Access

TL;DR

This paper introduces DDCA, a method that improves reasoning efficiency in reinforcement learning by dynamically adjusting penalties based on problem difficulty, reducing verbosity without sacrificing accuracy.

Contribution

The paper proposes DDCA, a novel approach that decouples efficiency from correctness by dynamically scaling penalties, addressing structural issues in RLVR for better reasoning performance.

Findings

01

Reduces generated tokens by ~60% on simple tasks

02

Maintains or improves accuracy across benchmarks

03

Adapts penalty strength based on problem difficulty

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) can elicit strong multi-step reasoning, yet it often encourages overly verbose traces. Moreover, naive length penalties in group-relative optimization can severely hurt accuracy. We attribute this failure to two structural issues: (i) Dilution of Length Baseline, where incorrect responses (with zero length reward) depress the group baseline and over-penalize correct solutions; and (ii) Difficulty-Penalty Mismatch, where a static penalty cannot adapt to problem difficulty, suppressing necessary reasoning on hard instances while leaving redundancy on easy ones. We propose Dynamic Decoupled Conditional Advantage (DDCA) to decouple efficiency optimization from correctness. DDCA computes length advantages conditionally within the correct-response cluster to eliminate baseline dilution, and dynamically scales the penalty strength using the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications