Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models

Xin Xu; Clive Bai; Kai Yang; Tianhao Chen; Yangkun Chen; Weijie Liu; Hao Chen; Yang Wang; Saiyong Yang; Can Yang

arXiv:2602.12036·cs.CL·April 23, 2026

Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models

Xin Xu, Clive Bai, Kai Yang, Tianhao Chen, Yangkun Chen, Weijie Liu, Hao Chen, Yang Wang, Saiyong Yang, Can Yang

PDF

1 Repo 8 Models 5 Datasets

TL;DR

Composition-RL enhances reinforcement learning with large language models by automatically composing multiple prompts into new verifiable questions, improving reasoning and cross-domain performance.

Contribution

It introduces a novel prompt composition method that better utilizes limited verifiable prompts, boosting reasoning ability and cross-domain RL effectiveness.

Findings

01

Consistent improvement in reasoning across models from 4B to 30B.

02

Further performance gains with curriculum-based compositional depth increase.

03

Effective cross-domain RL through prompt composition from different domains.

Abstract

Large-scale verifiable prompts underpin the success of Reinforcement Learning with Verifiable Rewards (RLVR), but they contain many uninformative examples and are costly to expand further. Recent studies focus on better exploiting limited training data by prioritizing hard prompts whose rollout pass rate is 0. However, easy prompts with a pass rate of 1 also become increasingly prevalent as training progresses, thereby reducing the effective data size. To mitigate this, we propose Composition-RL, a simple yet useful approach for better utilizing limited verifiable prompts targeting pass-rate-1 prompts. More specifically, Composition-RL automatically composes multiple problems into a new verifiable question and uses these compositional prompts for RL training. Extensive experiments across model sizes from 4B to 30B show that Composition-RL consistently improves reasoning capability over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

XinXU-USTC/Composition-RL
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.