From Atomic to Composite: Reinforcement Learning Enables Generalization in Complementary Reasoning
Sitao Cheng, Xunjian Yin, Ruiwen Zhou, Yuxuan Li, Xinyi Wang, Liangming Pan, William Yang Wang, Victor Zhong

TL;DR
This paper investigates how reinforcement learning (RL) enhances reasoning capabilities, demonstrating that RL synthesizes complex strategies from atomic skills learned via supervised fine-tuning, especially when models have mastered foundational reasoning skills.
Contribution
It reveals that RL acts as a reasoning synthesizer rather than just amplifying existing behaviors, emphasizing the importance of atomic skill mastery for effective complex reasoning generalization.
Findings
RL struggles with out-of-distribution generalization, especially in zero-shot settings.
Supervised fine-tuning achieves high in-distribution accuracy but fails out-of-distribution.
RL can synthesize complex reasoning strategies if atomic skills are mastered beforehand.
Abstract
The mechanism by which RL contributes to reasoning capabilities-whether it incentivizes the synthesis of new skills or merely amplifies existing behaviors-remains a subject of intense debate. In this work, we investigate this question through the lens of Complementary Reasoning, a complex task that requires integrating internal parametric knowledge with external contextual information. Using a controlled synthetic dataset of human biographies, we strictly decouple this ability into two atomic skills: Parametric Reasoning (relying on internal knowledge) and Contextual Reasoning (depending on external information). To rigorously assess capability boundaries, we evaluate generalization across three distinct levels of difficulty: I.I.D., Composition, and Zero-shot settings. We find that while SFT is sufficient for in-distribution performance, it struggles with O.O.D. generalization,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Child and Animal Learning Development · Domain Adaptation and Few-Shot Learning
