Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes
Guillermo Infante, Anders Jonsson, Vicen\c{c} G\'omez

TL;DR
This paper introduces a hierarchical reinforcement learning method for linearly-solvable Markov decision processes that efficiently learns globally optimal policies by leveraging state space partitioning and value function compositionality.
Contribution
It presents a novel hierarchical approach that estimates optimal values at multiple abstraction levels, enabling globally optimal policy learning without non-stationarity issues.
Findings
Can learn globally optimal policies
Reduces sample complexity in certain settings
Validated through multiple experiments
Abstract
In this work we present a novel approach to hierarchical reinforcement learning for linearly-solvable Markov decision processes. Our approach assumes that the state space is partitioned, and the subtasks consist in moving between the partitions. We represent value functions on several levels of abstraction, and use the compositionality of subtasks to estimate the optimal values of the states in each partition. The policy is implicitly defined on these optimal value estimates, rather than being decomposed among the subtasks. As a consequence, our approach can learn the globally optimal policy, and does not suffer from the non-stationarity of high-level decisions. If several partitions have equivalent dynamics, the subtasks of those partitions can be shared. If the set of boundary states is smaller than the entire state space, our approach can have significantly smaller sample complexity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
