Hierarchical model-based policy optimization: from actions to action   sequences and back

Daniel McNamee

arXiv:1912.01448·cs.LG·January 3, 2020

Hierarchical model-based policy optimization: from actions to action sequences and back

Daniel McNamee

PDF

Open Access

TL;DR

This paper introduces a hierarchical model-based policy optimization framework that leverages second-order methods and natural path gradients to improve policy updates by considering long-range state-action correlations, demonstrated through toy problems.

Contribution

It presents a novel hierarchical approach using second-order methods and natural path gradients for policy optimization, incorporating long-range dependencies in the state-action space.

Findings

01

Natural path gradient can be computed exactly with environment models

02

Policy updates reflect state-space hierarchy in toy problems

03

Prioritization of local updates improves policy performance

Abstract

We develop a normative framework for hierarchical model-based policy optimization based on applying second-order methods in the space of all possible state-action paths. The resulting natural path gradient performs policy updates in a manner which is sensitive to the long-range correlational structure of the induced stationary state-action densities. We demonstrate that the natural path gradient can be computed exactly given an environment dynamics model and depends on expressions akin to higher-order successor representations. In simulation, we show that the priorization of local policy updates in the resulting policy flow indeed reflects the intuitive state-space hierarchy in several toy problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Simulation Techniques and Applications · Formal Methods in Verification