A compact, hierarchical Q-function decomposition

Bhaskara Marthi; Stuart Russell; David Andre

arXiv:1206.6851·cs.LG·July 2, 2012

A compact, hierarchical Q-function decomposition

Bhaskara Marthi, Stuart Russell, David Andre

PDF

Open Access

TL;DR

This paper introduces a hierarchical Q-function decomposition method that efficiently captures exit state values, enabling more compact representations and better decision-making in hierarchical reinforcement learning.

Contribution

It proposes a recursive decomposition of exit value functions, reducing representation costs and improving hierarchical RL performance.

Findings

01

Effective in complex environments

02

Reduces representation complexity

03

Improves hierarchical decision-making

Abstract

Previous work in hierarchical reinforcement learning has faced a dilemma: either ignore the values of different possible exit states from a subroutine, thereby risking suboptimal behavior, or represent those values explicitly thereby incurring a possibly large representation cost because exit values refer to nonlocal aspects of the world (i.e., all subsequent rewards). This paper shows that, in many cases, one can avoid both of these problems. The solution is based on recursively decomposing the exit value function in terms of Q-functions at higher levels of the hierarchy. This leads to an intuitively appealing runtime architecture in which a parent subroutine passes to its child a value function on the exit states and the child reasons about how its choices affect the exit value. We also identify structural conditions on the value function and transition distributions that allow much…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Advanced Control Systems Optimization