Enhancing Sample Efficiency in Multi-Agent RL with Uncertainty Quantification and Selective Exploration

Tom Danino; Nahum Shimkin

arXiv:2506.02841·eess.SY·March 17, 2026

Enhancing Sample Efficiency in Multi-Agent RL with Uncertainty Quantification and Selective Exploration

Tom Danino, Nahum Shimkin

PDF

Open Access

TL;DR

This paper introduces a novel multi-agent reinforcement learning algorithm that enhances sample efficiency by combining ensemble-based uncertainty quantification, selective exploration, and a mixed on-policy/off-policy training approach, leading to improved performance on benchmarks.

Contribution

It proposes a new MARL method integrating ensemble kurtosis for exploration, a truncated TD($mbda$) for efficient critic training, and a mixed sampling strategy for actor updates, advancing sample efficiency and stability.

Findings

01

Outperforms state-of-the-art MARL baselines on SMAC II benchmarks.

02

Effectively guides exploration using ensemble kurtosis.

03

Reduces variance in critic training with truncated TD(mbda).

Abstract

Multi-agent reinforcement learning (MARL) methods have achieved state-of-the-art results on a range of multi-agent tasks. Yet, MARL algorithms typically require significantly more environment interactions than their single-agent counterparts to converge, a problem exacerbated by the difficulty in exploring over a large joint action space and the high variance intrinsic to MARL environments. To tackle these issues, we propose a novel algorithm that combines a decomposed centralized critic with decentralized ensemble learning, incorporating several key contributions. The main component in our scheme is a selective exploration method that leverages ensemble kurtosis. We extend the global decomposed critic with a diversity-regularized ensemble of individual critics and utilize its excess kurtosis to guide exploration toward high-uncertainty states and actions. To improve sample efficiency,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Neural Networks and Applications · Machine Learning and Data Classification