Locally Persistent Exploration in Continuous Control Tasks with Sparse   Rewards

Susan Amin (1; 2); Maziar Gomrokchi (1; 2); Hossein Aboutalebi; (3); Harsh Satija (1; 2); Doina Precup (1; 2) ((1) McGill; University; (2) Mila- Quebec Artificial Intelligence Institute; (3); University of Waterloo)

arXiv:2012.13658·cs.LG·June 15, 2021·6 cites

Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards

Susan Amin (1, 2), Maziar Gomrokchi (1, 2), Hossein Aboutalebi, (3), Harsh Satija (1, 2), Doina Precup (1, 2) ((1) McGill, University, (2) Mila- Quebec Artificial Intelligence Institute, (3), University of Waterloo)

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel exploration strategy for reinforcement learning in sparse reward environments, utilizing trajectory-dependent actions and statistical physics concepts to generate persistent, self-avoiding trajectories, improving exploration efficiency.

Contribution

The paper proposes a new exploration method based on locally self-avoiding trajectories that depend on the agent's history, enhancing exploration in continuous control tasks with sparse rewards.

Findings

01

Effective in 2D navigation tasks

02

Improves exploration in MuJoCo locomotion tasks

03

Provides theoretical insights into trajectory properties

Abstract

A major challenge in reinforcement learning is the design of exploration strategies, especially for environments with sparse reward structures and continuous state and action spaces. Intuitively, if the reinforcement signal is very scarce, the agent should rely on some form of short-term memory in order to cover its environment efficiently. We propose a new exploration method, based on two intuitions: (1) the choice of the next exploratory action should depend not only on the (Markovian) state of the environment, but also on the agent's trajectory so far, and (2) the agent should utilize a measure of spread in the state space to avoid getting stuck in a small region. Our method leverages concepts often used in statistical physics to provide explanations for the behavior of simplified (polymer) chains in order to generate persistent (locally self-avoiding) trajectories in state space. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

h-aboutalebi/SparseBaseline
pytorchOfficial

Videos

Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards· slideslive

Taxonomy

TopicsProtein Structure and Dynamics · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference