Efficient Exploration via State Marginal Matching

Lisa Lee; Benjamin Eysenbach; Emilio Parisotto; Eric Xing; Sergey; Levine; Ruslan Salakhutdinov

arXiv:1906.05274·cs.LG·March 2, 2020·96 cites

Efficient Exploration via State Marginal Matching

Lisa Lee, Benjamin Eysenbach, Emilio Parisotto, Eric Xing, Sergey, Levine, Ruslan Salakhutdinov

PDF

Open Access 1 Repo

TL;DR

This paper introduces a mathematically grounded exploration method in reinforcement learning called State Marginal Matching (SMM), which improves exploration efficiency and adaptability by matching state distributions.

Contribution

The paper formalizes exploration as a State Marginal Matching problem and develops an algorithm based on a two-player game framework, providing a new principled approach to exploration.

Findings

01

Agents using SMM explore faster in simulations.

02

SMM-based agents adapt more quickly to new tasks.

03

Prior methods approximately maximize the SMM objective.

Abstract

Exploration is critical to a reinforcement learning agent's performance in its given environment. Prior exploration methods are often based on using heuristic auxiliary predictions to guide policy behavior, lacking a mathematically-grounded objective with clear properties. In contrast, we recast exploration as a problem of State Marginal Matching (SMM), where we aim to learn a policy for which the state marginal distribution matches a given target state distribution. The target distribution is a uniform distribution in most cases, but can incorporate prior knowledge if available. In effect, SMM amortizes the cost of learning to explore in a given environment. The SMM objective can be viewed as a two-player, zero-sum game between a state density model and a parametric policy, an idea that we use to build an algorithm for optimizing the SMM objective. Using this formalism, we further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RLAgent/state-marginal-matching
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms