Landmark Guided Active Exploration with State-specific Balance Coefficient
Fei Cui, Jiaojiao Fang, Mengke Yang, Guizhong Liu

TL;DR
This paper introduces a landmark-guided exploration strategy for goal-conditioned hierarchical reinforcement learning that uses a dynamic balance between prospect and novelty to improve exploration efficiency and sample effectiveness.
Contribution
It proposes a novel exploration method combining prospect and novelty with a state-specific balance coefficient in GCHRL.
Findings
Significantly outperforms baseline methods in multiple tasks.
Improves sample efficiency in hierarchical reinforcement learning.
Effectively balances exploration factors dynamically.
Abstract
Goal-conditioned hierarchical reinforcement learning (GCHRL) decomposes long-horizon tasks into sub-tasks through a hierarchical framework and it has demonstrated promising results across a variety of domains. However, the high-level policy's action space is often excessively large, presenting a significant challenge to effective exploration and resulting in potentially inefficient training. In this paper, we design a measure of prospect for sub-goals by planning in the goal space based on the goal-conditioned value function. Building upon the measure of prospect, we propose a landmark-guided exploration strategy by integrating the measures of prospect and novelty which aims to guide the agent to explore efficiently and improve sample efficiency. In order to dynamically consider the impact of prospect and novelty on exploration, we introduce a state-specific balance coefficient to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReservoir Engineering and Simulation Methods · Distributed and Parallel Computing Systems · Inertial Sensor and Navigation
