Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards
Susan Amin (1, 2), Maziar Gomrokchi (1, 2), Hossein Aboutalebi, (3), Harsh Satija (1, 2), Doina Precup (1, 2) ((1) McGill, University, (2) Mila- Quebec Artificial Intelligence Institute, (3), University of Waterloo)

TL;DR
This paper introduces a novel exploration strategy for reinforcement learning in sparse reward environments, utilizing trajectory-dependent actions and statistical physics concepts to generate persistent, self-avoiding trajectories, improving exploration efficiency.
Contribution
The paper proposes a new exploration method based on locally self-avoiding trajectories that depend on the agent's history, enhancing exploration in continuous control tasks with sparse rewards.
Findings
Effective in 2D navigation tasks
Improves exploration in MuJoCo locomotion tasks
Provides theoretical insights into trajectory properties
Abstract
A major challenge in reinforcement learning is the design of exploration strategies, especially for environments with sparse reward structures and continuous state and action spaces. Intuitively, if the reinforcement signal is very scarce, the agent should rely on some form of short-term memory in order to cover its environment efficiently. We propose a new exploration method, based on two intuitions: (1) the choice of the next exploratory action should depend not only on the (Markovian) state of the environment, but also on the agent's trajectory so far, and (2) the agent should utilize a measure of spread in the state space to avoid getting stuck in a small region. Our method leverages concepts often used in statistical physics to provide explanations for the behavior of simplified (polymer) chains in order to generate persistent (locally self-avoiding) trajectories in state space. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsProtein Structure and Dynamics · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference
