DeCoRL: Decoupling Reasoning Chains via Parallel Sub-Step Generation and Cascaded Reinforcement for Interpretable and Scalable RLHF
Ziyuan Gao, Di Liang, Xianjie Wu, Philippe Morel, Minlong Peng

TL;DR
DeCoRL introduces a modular, parallel reasoning framework with independent sub-step scoring, significantly improving speed, interpretability, and energy efficiency in reinforcement learning for complex reasoning tasks.
Contribution
The paper proposes DeCoRL, a novel framework that decouples reasoning into parallel sub-steps with modular rewards, enabling scalable, interpretable, and real-time reinforcement learning.
Findings
Achieves 3.8x faster inference compared to sequential methods.
Improves interpretability by 22.7% through explicit reward attribution.
Reduces energy consumption by 72.4% and increases throughput by 68%.
Abstract
Existing reinforcement learning methods for Chain-of-Thought reasoning suffer from two critical limitations. First, they operate as monolithic black boxes that provide undifferentiated reward signals, obscuring individual step contributions and hindering error diagnosis. Second, sequential decoding has O(n) time complexity. This makes real-time deployment impractical for complex reasoning tasks. We present DeCoRL (Decoupled Reasoning Chains via Coordinated Reinforcement Learning), a novel framework that transforms reasoning from sequential processing into collaborative modular orchestration. DeCoRL trains lightweight specialized models to generate reasoning sub-steps concurrently, eliminating sequential bottlenecks through parallel processing. To enable precise error attribution, the framework designs modular reward functions that score each sub-step independently. Cascaded DRPO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications
