Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs
Qingyang Zhang, Yiming Yang, Jingqing Ruan, Xuantang Xiong, Dengpeng, Xing, Bo Xu

TL;DR
This paper introduces HILL, a hierarchical reinforcement learning method that learns temporally coherent latent subgoal representations and dynamically builds landmark graphs to effectively balance exploration and exploitation in complex tasks.
Contribution
HILL proposes a novel approach to learning latent subgoal representations with temporal coherence and a dynamic landmark graph construction for improved subgoal selection.
Findings
HILL outperforms state-of-the-art methods on continuous control tasks.
HILL improves sample efficiency and asymptotic performance.
The approach effectively balances exploration and exploitation.
Abstract
Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promising paradigm to address the exploration-exploitation dilemma in reinforcement learning. It decomposes the source task into subgoal conditional subtasks and conducts exploration and exploitation in the subgoal space. The effectiveness of GCHRL heavily relies on subgoal representation functions and subgoal selection strategy. However, existing works often overlook the temporal coherence in GCHRL when learning latent subgoal representations and lack an efficient subgoal selection strategy that balances exploration and exploitation. This paper proposes HIerarchical reinforcement learning via dynamically building Latent Landmark graphs (HILL) to overcome these limitations. HILL learns latent subgoal representations that satisfy temporal coherence using a contrastive representation learning objective. Based on these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
