To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models

Haoqing Wang; Xiang Long; Ziheng Li; Yilong Xu; Tingguang Li; Yehui Tang

arXiv:2602.12566·cs.AI·March 12, 2026

To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models

Haoqing Wang, Xiang Long, Ziheng Li, Yilong Xu, Tingguang Li, Yehui Tang

PDF

Open Access

TL;DR

This paper compares two multi-domain reinforcement learning paradigms for large language models, analyzing their effects on performance across various tasks and revealing insights into their internal mechanisms.

Contribution

It provides a comprehensive comparison and analysis of mixed multi-task RLVR and separate RLVR followed by model merging for large language models across multiple domains.

Findings

01

RLVR across domains shows minimal mutual interference.

02

Reasoning-intensive domains have mutually synergistic effects.

03

Internal mechanisms analyzed include weight space geometry and self-verification.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) plays a key role in stimulating the explicit reasoning capability of Large Language Models (LLMs). We can achieve expert-level performance in some specific domains via RLVR, such as coding or math. When a general multi-domain expert-level model is required, we need to carefully consider the collaboration of RLVR across different domains. The current state-of-the-art models mainly employ two different training paradigms for multi-domain RLVR: mixed multi-task RLVR and separate RLVR followed by model merging. However, most of the works did not provide a detailed comparison and analysis about these paradigms. To this end, we choose multiple commonly used high-level tasks (e.g., math, coding, science, instruction following, and agent) as our target domains and design extensive qualitative and quantitative experiments using open-source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education