Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning
Yu Li, Zhuoshi Pan, Honglin Lin, Mengyuan Sun, Conghui He, Lijun Wu

TL;DR
This paper systematically investigates how reinforcement learning can enhance multi-domain reasoning in large language models, focusing on mathematical, coding, and logical reasoning, and analyzing domain interactions and training strategies.
Contribution
It provides a comprehensive analysis of multi-domain reasoning under RLVR, exploring model improvements, domain interactions, and training strategies to improve generalization and reasoning capabilities.
Findings
Cross-domain training can lead to mutual enhancements and conflicts.
Curriculum learning and reward design significantly impact performance.
Base and instruct models show different behaviors under RL training.
Abstract
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing the reasoning capabilities of LLMs. Existing research has predominantly concentrated on isolated reasoning domains such as mathematical problem-solving, coding tasks, or logical reasoning. However, real world reasoning scenarios inherently demand an integrated application of multiple cognitive skills. Despite this, the interplay among these reasoning skills under reinforcement learning remains poorly understood. To bridge this gap, we present a systematic investigation of multi-domain reasoning within the RLVR framework, explicitly focusing on three primary domains: mathematical reasoning, code generation, and logical puzzle solving. We conduct a comprehensive study comprising four key components: (1) Leveraging the GRPO algorithm and the Qwen-2.5-7B model family, our study thoroughly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
