Hierarchical Reinforcement Learning with the MAXQ Value Function   Decomposition

Thomas G. Dietterich

arXiv:cs/9905014·cs.LG·May 23, 2007·21 cites

Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition

Thomas G. Dietterich

PDF

Open Access 5 Repos

TL;DR

This paper introduces the MAXQ hierarchical reinforcement learning framework, decomposing MDPs into smaller parts, and demonstrates that MAXQ-Q converges faster than flat Q-learning while enabling improved policy computation.

Contribution

The paper presents the MAXQ hierarchy, formalizes its properties, and introduces MAXQ-Q, an online learning algorithm with proven convergence and efficiency benefits.

Findings

01

MAXQ-Q converges faster than flat Q-learning in experiments.

02

MAXQ representation enables efficient computation of improved policies.

03

The approach handles state abstractions while maintaining convergence guarantees.

Abstract

This paper presents the MAXQ approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The paper defines the MAXQ hierarchy, proves formal results on its representational power, and establishes five conditions for the safe use of state abstractions. The paper presents an online model-free learning algorithm, MAXQ-Q, and proves that it converges wih probability 1 to a kind of locally-optimal policy known as a recursively optimal policy, even in the presence of the five kinds of state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q through a series of experiments in three domains and shows experimentally that MAXQ-Q (with state abstractions) converges to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Bayesian Modeling and Causal Inference · Software Reliability and Analysis Research