One Policy is Enough: Parallel Exploration with a Single Policy is Near-Optimal for Reward-Free Reinforcement Learning
Pedro Cisneros-Velarde, Boxiang Lyu, Sanmi Koyejo, Mladen, Kolar

TL;DR
This paper demonstrates that using a single policy for parallel exploration in reward-free RL is nearly optimal and provides significant speedups, simplifying the exploration process in linear MDPs and zero-sum Markov games.
Contribution
It proves that a single policy suffices for near-optimal parallel exploration in reward-free RL, contrasting with prior methods requiring diverse policies.
Findings
Single policy achieves near-linear speedup in exploration.
Single policy is near-minimax optimal in linear MDPs.
Parallel exploration with one policy simplifies implementation without sacrificing performance.
Abstract
Although parallelism has been extensively used in reinforcement learning (RL), the quantitative effects of parallel exploration are not well understood theoretically. We study the benefits of simple parallel exploration for reward-free RL in linear Markov decision processes (MDPs) and two-player zero-sum Markov games (MGs). In contrast to the existing literature, which focuses on approaches that encourage agents to explore a diverse set of policies, we show that using a single policy to guide exploration across all agents is sufficient to obtain an almost-linear speedup in all cases compared to their fully sequential counterpart. Furthermore, we demonstrate that this simple procedure is near-minimax optimal in the reward-free setting for linear MDPs. From a practical perspective, our paper shows that a single policy is sufficient and provably near-optimal for incorporating parallelism…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
