Adaptive Ability Decomposing for Unlocking Large Reasoning Model Effective Reinforcement Learning
Zhipeng Chen, Xiaobo Qin, Wayne Xin Zhao, Youbin Wu, Ji-Rong Wen

TL;DR
This paper introduces A$^2$D, an adaptive ability decomposing method that improves reinforcement learning with verifiable rewards for large language models by decomposing complex questions into simpler sub-questions, enhancing reasoning.
Contribution
The paper proposes a novel A$^2$D method that decomposes questions to improve RLVR effectiveness, functioning as a plug-and-play module adaptable to various algorithms.
Findings
A$^2$D outperforms baseline methods in reasoning tasks.
The decomposer effectively guides the reasoner with sub-questions.
Analysis reveals how RLVR influences decomposer performance.
Abstract
Reinforcement learning with verifiable rewards (RLVR) has shown great potential to enhance the reasoning ability of large language models (LLMs). However, due to the limited amount of information provided during the RLVR process, the model can only engage in largely blind exploration, which often results in failure on challenging problems. To provide additional information for the RLVR process without relying on a teacher model, we propose AD, an Adaptive Ability Decomposing method for enhancing the effectiveness of RLVR. Specifically, we first train a decomposer via RLVR without distillation, enabling it to decompose complex questions into a set of simpler sub-questions. Next, we use this decomposer to annotate sub-questions for each question in the training dataset, and then train the reasoner under RLVR with sub-question guidance. To better understand AD, we first compare its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
