MIR: Efficient Exploration in Episodic Multi-Agent Reinforcement Learning via Mutual Intrinsic Reward

Kesheng Chen; Wenjian Luo; Bang Zhang; Zeping Yin; Zipeng Ye

arXiv:2511.17165·cs.AI·November 24, 2025

MIR: Efficient Exploration in Episodic Multi-Agent Reinforcement Learning via Mutual Intrinsic Reward

Kesheng Chen, Wenjian Luo, Bang Zhang, Zeping Yin, Zipeng Ye

PDF

Open Access

TL;DR

This paper introduces Mutual Intrinsic Reward (MIR), a novel method to enhance exploration in multi-agent reinforcement learning with sparse episodic rewards, addressing challenges of joint action sparsity and team state influence, leading to improved performance.

Contribution

The paper proposes MIR, a simple enhancement strategy that incentivizes agents to explore actions affecting teammates, significantly improving exploration and performance in MARL with sparse rewards.

Findings

01

MIR outperforms state-of-the-art methods in MiniGrid-MA environments.

02

MIR effectively stimulates team exploration in sparse reward settings.

03

Experimental results show improved learning efficiency and success rates.

Abstract

Episodic rewards present a significant challenge in reinforcement learning. While intrinsic reward methods have demonstrated effectiveness in single-agent rein-forcement learning scenarios, their application to multi-agent reinforcement learn-ing (MARL) remains problematic. The primary difficulties stem from two fac-tors: (1) the exponential sparsity of joint action trajectories that lead to rewards as the exploration space expands, and (2) existing methods often fail to account for joint actions that can influence team states. To address these challenges, this paper introduces Mutual Intrinsic Reward (MIR), a simple yet effective enhancement strategy for MARL with extremely sparse rewards like episodic rewards. MIR incentivizes individual agents to explore actions that affect their teammates, and when combined with original strategies, effectively stimulates team exploration and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Stochastic Gradient Optimization Techniques