Behavior Knowledge Merge in Reinforced Agentic Models

Xiangchi Yuan; Dachuan Shi; Chunhui Zhang; Zheyuan Liu; Shenglong Yao; Soroush Vosoughi; Wenke Lee

arXiv:2601.13572·cs.LG·January 21, 2026

Behavior Knowledge Merge in Reinforced Agentic Models

Xiangchi Yuan, Dachuan Shi, Chunhui Zhang, Zheyuan Liu, Shenglong Yao, Soroush Vosoughi, Wenke Lee

PDF

Open Access 4 Models

TL;DR

This paper introduces Reinforced Agent Merging (RAM), a novel method for effectively merging RL-trained agentic models by addressing task-vector mismatch issues, leading to improved performance over existing methods.

Contribution

The paper proposes RAM, a distribution-aware merging framework that preserves task-specific behaviors in RL-trained models, outperforming traditional merging approaches.

Findings

01

RAM outperforms baseline merging methods across multiple domains.

02

RAM enables synergistic effects, surpassing individual specialized agents.

03

The method effectively preserves task-specific capabilities during merging.

Abstract

Reinforcement learning (RL) is central to post-training, particularly for agentic models that require specialized reasoning behaviors. In this setting, model merging offers a practical mechanism for integrating multiple RL-trained agents from different tasks into a single generalist model. However, existing merging methods are designed for supervised fine-tuning (SFT), and they are suboptimal to preserve task-specific capabilities on RL-trained agentic models. The root is a task-vector mismatch between RL and SFT: on-policy RL induces task vectors that are highly sparse and heterogeneous, whereas SFT-style merging implicitly assumes dense and globally comparable task vectors. When standard global averaging is applied under this mismatch, RL's non-overlapping task vectors that encode critical task-specific behaviors are reduced and parameter updates are diluted. To address this issue, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning