Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
Xin Xu, Clive Bai, Kai Yang, Tianhao Chen, Yangkun Chen, Weijie Liu, Hao Chen, Yang Wang, Saiyong Yang, Can Yang

TL;DR
Composition-RL enhances reinforcement learning with large language models by automatically composing multiple prompts into new verifiable questions, improving reasoning and cross-domain performance.
Contribution
It introduces a novel prompt composition method that better utilizes limited verifiable prompts, boosting reasoning ability and cross-domain RL effectiveness.
Findings
Consistent improvement in reasoning across models from 4B to 30B.
Further performance gains with curriculum-based compositional depth increase.
Effective cross-domain RL through prompt composition from different domains.
Abstract
Large-scale verifiable prompts underpin the success of Reinforcement Learning with Verifiable Rewards (RLVR), but they contain many uninformative examples and are costly to expand further. Recent studies focus on better exploiting limited training data by prioritizing hard prompts whose rollout pass rate is 0. However, easy prompts with a pass rate of 1 also become increasingly prevalent as training progresses, thereby reducing the effective data size. To mitigate this, we propose Composition-RL, a simple yet useful approach for better utilizing limited verifiable prompts targeting pass-rate-1 prompts. More specifically, Composition-RL automatically composes multiple problems into a new verifiable question and uses these compositional prompts for RL training. Extensive experiments across model sizes from 4B to 30B show that Composition-RL consistently improves reasoning capability over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗xx18/Composition-RL-8Bmodel· 16 dl· ♡ 116 dl♡ 1
- 🤗xx18/Composition-RL-14Bmodel· 18 dl18 dl
- 🤗xx18/Composition-RL-30B-A3Bmodel· 13 dl13 dl
- 🤗xx18/Composition-RL-4Bmodel· 139 dl139 dl
- 🤗xx18/Composition-RL-4B-Depth1_2model· 11 dl11 dl
- 🤗xx18/Composition-RL-4B-Depth1_2_3model· 12 dl12 dl
- 🤗xx18/Composition-RL-4B-Physics_Mathmodel· 15 dl15 dl
- 🤗xx18/Baseline-4B-MATH12Kmodel· 267 dl267 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
