The Horizon Threshold in Cooperative Multi-Agent Reward-Free Exploration
Idan Barnea, Orin Levy, Yishay Mansour

TL;DR
This paper investigates the tradeoff between the number of learning phases and agents in reward-free multi-agent reinforcement learning, revealing a critical horizon-dependent regime change.
Contribution
It characterizes the minimal number of phases needed for efficient learning and provides an algorithm and lower bounds demonstrating the importance of the horizon $H$.
Findings
Efficient algorithm with $ ilde{O}(S^6 H^6 A / ext{epsilon}^2)$ agents for $H$ phases.
Lower bound shows fewer than $H$ phases require exponentially many agents.
$ ext{Theta}(H)$ phases are necessary and sufficient for polynomial agent count.
Abstract
We study cooperative multi-agent reinforcement learning in the setting of reward-free exploration, where multiple agents jointly explore an unknown MDP in order to learn its dynamics (without observing rewards). We focus on a tabular finite-horizon MDP and adopt a phased learning framework. In each learning phase, multiple agents independently interact with the environment. More specifically, in each learning phase, each agent is assigned a policy, executes it, and observes the resulting trajectory. Our primary goal is to characterize the tradeoff between the number of learning phases and the number of agents, especially when the number of learning phases is small. Our results identify a regime change governed by the horizon . When the number of learning phases equals , we present a computationally efficient algorithm that uses only agents to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
