Algorithms for Batch Hierarchical Reinforcement Learning
Tiancheng Zhao, Mohammad Gowayyed

TL;DR
This paper introduces Hierarchical Q-value Iteration (HQI), an off-policy HRL algorithm that efficiently learns optimal policies for hierarchical MDPs using fixed datasets, demonstrating faster convergence and flexibility in structure comparison.
Contribution
The paper presents a novel off-policy HRL algorithm, HQI, with proven convergence and the ability to learn optimal policies across various hierarchical decompositions from a fixed dataset.
Findings
HQI converges faster than flat Q-value iteration.
HQI enables learning from a fixed dataset for different hierarchies.
HQI demonstrates easy state abstraction and hierarchical policy optimality.
Abstract
Hierarchical Reinforcement Learning (HRL) exploits temporal abstraction to solve large Markov Decision Processes (MDP) and provide transferable subtask policies. In this paper, we introduce an off-policy HRL algorithm: Hierarchical Q-value Iteration (HQI). We show that it is possible to effectively learn recursive optimal policies for any valid hierarchical decomposition of the original MDP, given a fixed dataset collected from a flat stochastic behavioral policy. We first formally prove the convergence of the algorithm for tabular MDP. Then our experiments on the Taxi domain show that HQI converges faster than a flat Q-value Iteration and enjoys easy state abstraction. Also, we demonstrate that our algorithm is able to learn optimal policies for different hierarchical structures from the same fixed dataset, which enables model comparison without recollecting data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety · Explainable Artificial Intelligence (XAI)
