How Does RL Post-training Induce Skill Composition? A Case Study on Countdown
Simon Park, Simran Kaur, Sanjeev Arora

TL;DR
This paper investigates how reinforcement learning post-training enhances compositional skill transfer in language models, using the Countdown task to analyze tree-structured solutions and generalization patterns.
Contribution
It provides a detailed analysis of how RL induces skill composition, revealing structure-dependent learnability and generalization behaviors in expression tree solutions.
Findings
Models generalize out-of-distribution to larger inputs and unseen tree shapes.
Shallow balanced trees are learned before deep unbalanced ones.
Fragility persists on right-heavy tree structures despite similar depth.
Abstract
While reinforcement learning (RL) successfully enhances reasoning in large language models, its role in fostering compositional generalization (the ability to synthesize novel skills from known components) is often conflated with mere length generalization. To this end, we study what RL post-training teaches about skill composition and how the structure of the composition affects the skill transfer. We focus on the Countdown task (given n numbers and a target, form an expression that evaluates to the target) and analyze model solutions as expression trees, where each subtree corresponds to a reusable subtask and thus can be viewed as a ``skill.'' Tracking tree shapes and their success rates over training, we find: (i) out-of-distribution (OOD) generalization to larger n and to unseen tree shapes, indicating compositional reuse of subtasks; (ii) a structure-dependent hierarchy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language and cultural evolution · Topic Modeling
