From Atomic to Composite: Reinforcement Learning Enables Generalization in Complementary Reasoning

Sitao Cheng; Xunjian Yin; Ruiwen Zhou; Yuxuan Li; Xinyi Wang; Liangming Pan; William Yang Wang; Victor Zhong

arXiv:2512.01970·cs.AI·December 3, 2025

From Atomic to Composite: Reinforcement Learning Enables Generalization in Complementary Reasoning

Sitao Cheng, Xunjian Yin, Ruiwen Zhou, Yuxuan Li, Xinyi Wang, Liangming Pan, William Yang Wang, Victor Zhong

PDF

Open Access 1 Datasets

TL;DR

This paper investigates how reinforcement learning (RL) enhances reasoning capabilities, demonstrating that RL synthesizes complex strategies from atomic skills learned via supervised fine-tuning, especially when models have mastered foundational reasoning skills.

Contribution

It reveals that RL acts as a reasoning synthesizer rather than just amplifying existing behaviors, emphasizing the importance of atomic skill mastery for effective complex reasoning generalization.

Findings

01

RL struggles with out-of-distribution generalization, especially in zero-shot settings.

02

Supervised fine-tuning achieves high in-distribution accuracy but fails out-of-distribution.

03

RL can synthesize complex reasoning strategies if atomic skills are mastered beforehand.

Abstract

The mechanism by which RL contributes to reasoning capabilities-whether it incentivizes the synthesis of new skills or merely amplifies existing behaviors-remains a subject of intense debate. In this work, we investigate this question through the lens of Complementary Reasoning, a complex task that requires integrating internal parametric knowledge with external contextual information. Using a controlled synthetic dataset of human biographies, we strictly decouple this ability into two atomic skills: Parametric Reasoning (relying on internal knowledge) and Contextual Reasoning (depending on external information). To rigorously assess capability boundaries, we evaluate generalization across three distinct levels of difficulty: I.I.D., Composition, and Zero-shot settings. We find that while SFT is sufficient for in-distribution performance, it struggles with O.O.D. generalization,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

sitao/From_atomic_to_composite
dataset· 31 dl
31 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Child and Animal Learning Development · Domain Adaptation and Few-Shot Learning