SHIRO: Soft Hierarchical Reinforcement Learning
Kandai Watanabe, Mathew Strong, Omer Eldar

TL;DR
SHIRO introduces an entropy-maximizing hierarchical reinforcement learning algorithm that enhances exploration and learning efficiency in robotic control tasks by adding entropy at both hierarchy levels, supported by theoretical and empirical evidence.
Contribution
This work is the first to theoretically motivate and empirically validate the addition of entropy at both levels of hierarchical reinforcement learning.
Findings
Adding entropy to high-level policies improves exploration.
SHIRO outperforms state-of-the-art methods on robotic benchmarks.
High temperature in low-level policies increases stochasticity and learning difficulty.
Abstract
Hierarchical Reinforcement Learning (HRL) algorithms have been demonstrated to perform well on high-dimensional decision making and robotic control tasks. However, because they solely optimize for rewards, the agent tends to search the same space redundantly. This problem reduces the speed of learning and achieved reward. In this work, we present an Off-Policy HRL algorithm that maximizes entropy for efficient exploration. The algorithm learns a temporally abstracted low-level policy and is able to explore broadly through the addition of entropy to the high-level. The novelty of this work is the theoretical motivation of adding entropy to the RL objective in the HRL setting. We empirically show that the entropy can be added to both levels if the Kullback-Leibler (KL) divergence between consecutive updates of the low-level policy is sufficiently small. We performed an ablative study to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
