Adaptive Ability Decomposing for Unlocking Large Reasoning Model Effective Reinforcement Learning

Zhipeng Chen; Xiaobo Qin; Wayne Xin Zhao; Youbin Wu; Ji-Rong Wen

arXiv:2602.00759·cs.CL·February 3, 2026

Adaptive Ability Decomposing for Unlocking Large Reasoning Model Effective Reinforcement Learning

Zhipeng Chen, Xiaobo Qin, Wayne Xin Zhao, Youbin Wu, Ji-Rong Wen

PDF

Open Access

TL;DR

This paper introduces A$^2$D, an adaptive ability decomposing method that improves reinforcement learning with verifiable rewards for large language models by decomposing complex questions into simpler sub-questions, enhancing reasoning.

Contribution

The paper proposes a novel A$^2$D method that decomposes questions to improve RLVR effectiveness, functioning as a plug-and-play module adaptable to various algorithms.

Findings

01

A$^2$D outperforms baseline methods in reasoning tasks.

02

The decomposer effectively guides the reasoner with sub-questions.

03

Analysis reveals how RLVR influences decomposer performance.

Abstract

Reinforcement learning with verifiable rewards (RLVR) has shown great potential to enhance the reasoning ability of large language models (LLMs). However, due to the limited amount of information provided during the RLVR process, the model can only engage in largely blind exploration, which often results in failure on challenging problems. To provide additional information for the RLVR process without relying on a teacher model, we propose A $^{2}$ D, an Adaptive Ability Decomposing method for enhancing the effectiveness of RLVR. Specifically, we first train a decomposer via RLVR without distillation, enabling it to decompose complex questions into a set of simpler sub-questions. Next, we use this decomposer to annotate sub-questions for each question in the training dataset, and then train the reasoner under RLVR with sub-question guidance. To better understand A $^{2}$ D, we first compare its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications