SHIRO: Soft Hierarchical Reinforcement Learning

Kandai Watanabe; Mathew Strong; Omer Eldar

arXiv:2212.12786·cs.RO·December 27, 2022

SHIRO: Soft Hierarchical Reinforcement Learning

Kandai Watanabe, Mathew Strong, Omer Eldar

PDF

Open Access

TL;DR

SHIRO introduces an entropy-maximizing hierarchical reinforcement learning algorithm that enhances exploration and learning efficiency in robotic control tasks by adding entropy at both hierarchy levels, supported by theoretical and empirical evidence.

Contribution

This work is the first to theoretically motivate and empirically validate the addition of entropy at both levels of hierarchical reinforcement learning.

Findings

01

Adding entropy to high-level policies improves exploration.

02

SHIRO outperforms state-of-the-art methods on robotic benchmarks.

03

High temperature in low-level policies increases stochasticity and learning difficulty.

Abstract

Hierarchical Reinforcement Learning (HRL) algorithms have been demonstrated to perform well on high-dimensional decision making and robotic control tasks. However, because they solely optimize for rewards, the agent tends to search the same space redundantly. This problem reduces the speed of learning and achieved reward. In this work, we present an Off-Policy HRL algorithm that maximizes entropy for efficient exploration. The algorithm learns a temporally abstracted low-level policy and is able to explore broadly through the addition of entropy to the high-level. The novelty of this work is the theoretical motivation of adding entropy to the RL objective in the HRL setting. We empirically show that the entropy can be added to both levels if the Kullback-Leibler (KL) divergence between consecutive updates of the low-level policy is sufficiently small. We performed an ablative study to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings