Quality-Aware Exploration Budget Allocation for Cooperative Multi-Agent Reinforcement Learning

Dahyun Oh; Minhyuk Yoon; H.Jin Kim

arXiv:2605.01865·cs.MA·May 5, 2026

Quality-Aware Exploration Budget Allocation for Cooperative Multi-Agent Reinforcement Learning

Dahyun Oh, Minhyuk Yoon, H.Jin Kim

PDF

TL;DR

This paper introduces a novel framework for adaptive exploration in cooperative multi-agent reinforcement learning, balancing global exploration intensity and per-agent reward signal quality to improve coordination.

Contribution

It proposes a combined approach using a return-conditioned sigmoid schedule and a reward signal quality metric to automatically allocate exploration efforts among agents.

Findings

01

Achieves top-tier returns on seven cooperative benchmarks.

02

Effectively balances exploration intensity to prevent coordination collapse.

03

Automatically allocates exploration based on signal-to-noise ratio.

Abstract

Cooperative multi-agent reinforcement learning (MARL) requires agents to discover joint strategies in a combinatorially large state-action space, yet effective coordination configurations are exceedingly rare. Intrinsic motivation, which augments task rewards with novelty bonuses, is a popular approach for driving exploration, but its effectiveness hinges on the exploration intensity $β$ , where too large a value overwhelms the task signal and causes coordination collapse, while too small a value prevents discovery of rare strategies. We address two complementary challenges: adapting $β$ globally over training, and allocating the exploration budget across agents whose intrinsic reward signals vary in reliability. Our framework combines a return-conditioned sigmoid schedule (RCB) for global intensity control with a per-agent Reward Signal Quality (RSQ) metric that concentrates the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.