Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning

Yu Li; Zhuoshi Pan; Honglin Lin; Mengyuan Sun; Conghui He; Lijun Wu

arXiv:2507.17512·cs.AI·July 24, 2025

Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning

Yu Li, Zhuoshi Pan, Honglin Lin, Mengyuan Sun, Conghui He, Lijun Wu

PDF

Open Access

TL;DR

This paper systematically investigates how reinforcement learning can enhance multi-domain reasoning in large language models, focusing on mathematical, coding, and logical reasoning, and analyzing domain interactions and training strategies.

Contribution

It provides a comprehensive analysis of multi-domain reasoning under RLVR, exploring model improvements, domain interactions, and training strategies to improve generalization and reasoning capabilities.

Findings

01

Cross-domain training can lead to mutual enhancements and conflicts.

02

Curriculum learning and reward design significantly impact performance.

03

Base and instruct models show different behaviors under RL training.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing the reasoning capabilities of LLMs. Existing research has predominantly concentrated on isolated reasoning domains such as mathematical problem-solving, coding tasks, or logical reasoning. However, real world reasoning scenarios inherently demand an integrated application of multiple cognitive skills. Despite this, the interplay among these reasoning skills under reinforcement learning remains poorly understood. To bridge this gap, we present a systematic investigation of multi-domain reasoning within the RLVR framework, explicitly focusing on three primary domains: mathematical reasoning, code generation, and logical puzzle solving. We conduct a comprehensive study comprising four key components: (1) Leveraging the GRPO algorithm and the Qwen-2.5-7B model family, our study thoroughly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics