Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models

Xiaoze Liu; Dhananjay Ram; Yuting Zhang; Zhaoyang Zhang; Wei Xia; Stefano Soatto

arXiv:2605.07244·cs.LG·May 11, 2026

Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models

Xiaoze Liu, Dhananjay Ram, Yuting Zhang, Zhaoyang Zhang, Wei Xia, Stefano Soatto

PDF

TL;DR

This paper proposes a novel framework called Mutual Reinforcement Learning for concurrent training of heterogeneous language models, enabling experience sharing across different model architectures and vocabularies.

Contribution

It introduces a comprehensive experience-sharing framework with modules for heterogeneous tokenization, resource allocation, and experience exchange, demonstrated through three specific sharing mechanisms.

Findings

01

Outcome-level sharing offers the best stability-support trade-off.

02

The framework effectively aligns experiences across incompatible vocabularies.

03

Different sharing strategies impact model stability and success transfer.

Abstract

We introduce Mutual Reinforcement Learning, a framework for concurrent RL post-training in which heterogeneous LLM policies exchange typed experience while keeping separate parameters, objectives, and tokenizers. The framework combines a Shared Experience Exchange (SEE), Multi-Worker Resource Allocation (MWRA), and a Tokenizer Heterogeneity Layer (THL) that retokenizes text and aligns token-level traces across incompatible vocabularies. This substrate makes the experience-sharing design question operational across model families. We instantiate three controlled probes on top of GRPO: data-level rollout sharing via Peer Rollout Pooling (PRP), value-level advantage sharing via Cross-Policy GRPO Advantage Sharing (XGRPO), and outcome-level success transfer via Success-Gated Transfer (SGT). A contextual-bandit analysis characterizes their structural positions on a stability-support…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.