Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
Thomas G. Dietterich

TL;DR
This paper introduces the MAXQ hierarchical reinforcement learning framework, decomposing MDPs into smaller parts, and demonstrates that MAXQ-Q converges faster than flat Q-learning while enabling improved policy computation.
Contribution
The paper presents the MAXQ hierarchy, formalizes its properties, and introduces MAXQ-Q, an online learning algorithm with proven convergence and efficiency benefits.
Findings
MAXQ-Q converges faster than flat Q-learning in experiments.
MAXQ representation enables efficient computation of improved policies.
The approach handles state abstractions while maintaining convergence guarantees.
Abstract
This paper presents the MAXQ approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The paper defines the MAXQ hierarchy, proves formal results on its representational power, and establishes five conditions for the safe use of state abstractions. The paper presents an online model-free learning algorithm, MAXQ-Q, and proves that it converges wih probability 1 to a kind of locally-optimal policy known as a recursively optimal policy, even in the presence of the five kinds of state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q through a series of experiments in three domains and shows experimentally that MAXQ-Q (with state abstractions) converges to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Bayesian Modeling and Causal Inference · Software Reliability and Analysis Research
