Efficient Exploration via State Marginal Matching
Lisa Lee, Benjamin Eysenbach, Emilio Parisotto, Eric Xing, Sergey, Levine, Ruslan Salakhutdinov

TL;DR
This paper introduces a mathematically grounded exploration method in reinforcement learning called State Marginal Matching (SMM), which improves exploration efficiency and adaptability by matching state distributions.
Contribution
The paper formalizes exploration as a State Marginal Matching problem and develops an algorithm based on a two-player game framework, providing a new principled approach to exploration.
Findings
Agents using SMM explore faster in simulations.
SMM-based agents adapt more quickly to new tasks.
Prior methods approximately maximize the SMM objective.
Abstract
Exploration is critical to a reinforcement learning agent's performance in its given environment. Prior exploration methods are often based on using heuristic auxiliary predictions to guide policy behavior, lacking a mathematically-grounded objective with clear properties. In contrast, we recast exploration as a problem of State Marginal Matching (SMM), where we aim to learn a policy for which the state marginal distribution matches a given target state distribution. The target distribution is a uniform distribution in most cases, but can incorporate prior knowledge if available. In effect, SMM amortizes the cost of learning to explore in a given environment. The SMM objective can be viewed as a two-player, zero-sum game between a state density model and a parametric policy, an idea that we use to build an algorithm for optimizing the SMM objective. Using this formalism, we further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
