Adjacency constraint for efficient hierarchical reinforcement learning

Tianren Zhang; Shangqi Guo; Tian Tan; Xiaolin Hu; Feng Chen

arXiv:2111.00213·cs.LG·August 23, 2022

Adjacency constraint for efficient hierarchical reinforcement learning

Tianren Zhang, Shangqi Guo, Tian Tan, Xiaolin Hu, Feng Chen

PDF

Open Access

TL;DR

This paper introduces an adjacency constraint in hierarchical reinforcement learning to limit the high-level goal space, which improves training efficiency and performance in complex control tasks.

Contribution

It proposes a novel adjacency constraint method that preserves optimal policies in deterministic MDPs and bounds suboptimality in stochastic MDPs, with practical implementation via an adjacency network.

Findings

01

Significant performance improvements on robot locomotion tasks.

02

Effective reduction of goal space complexity.

03

Theoretical guarantees for policy optimality and suboptimality bounds.

Abstract

Goal-conditioned Hierarchical Reinforcement Learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is large. Searching in a large goal space poses difficulty for both high-level subgoal generation and low-level policy learning. In this paper, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a $k$ -step adjacent region of the current state using an adjacency constraint. We theoretically prove that in a deterministic Markov Decision Process (MDP), the proposed adjacency constraint preserves the optimal hierarchical policy, while in a stochastic MDP the adjacency constraint induces a bounded state-value suboptimality determined by the MDP's transition structure.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics