Quality-Aware Exploration Budget Allocation for Cooperative Multi-Agent Reinforcement Learning
Dahyun Oh, Minhyuk Yoon, H.Jin Kim

TL;DR
This paper introduces a novel framework for adaptive exploration in cooperative multi-agent reinforcement learning, balancing global exploration intensity and per-agent reward signal quality to improve coordination.
Contribution
It proposes a combined approach using a return-conditioned sigmoid schedule and a reward signal quality metric to automatically allocate exploration efforts among agents.
Findings
Achieves top-tier returns on seven cooperative benchmarks.
Effectively balances exploration intensity to prevent coordination collapse.
Automatically allocates exploration based on signal-to-noise ratio.
Abstract
Cooperative multi-agent reinforcement learning (MARL) requires agents to discover joint strategies in a combinatorially large state-action space, yet effective coordination configurations are exceedingly rare. Intrinsic motivation, which augments task rewards with novelty bonuses, is a popular approach for driving exploration, but its effectiveness hinges on the exploration intensity , where too large a value overwhelms the task signal and causes coordination collapse, while too small a value prevents discovery of rare strategies. We address two complementary challenges: adapting globally over training, and allocating the exploration budget across agents whose intrinsic reward signals vary in reliability. Our framework combines a return-conditioned sigmoid schedule (RCB) for global intensity control with a per-agent Reward Signal Quality (RSQ) metric that concentrates the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
