Hierarchical model-based policy optimization: from actions to action sequences and back
Daniel McNamee

TL;DR
This paper introduces a hierarchical model-based policy optimization framework that leverages second-order methods and natural path gradients to improve policy updates by considering long-range state-action correlations, demonstrated through toy problems.
Contribution
It presents a novel hierarchical approach using second-order methods and natural path gradients for policy optimization, incorporating long-range dependencies in the state-action space.
Findings
Natural path gradient can be computed exactly with environment models
Policy updates reflect state-space hierarchy in toy problems
Prioritization of local updates improves policy performance
Abstract
We develop a normative framework for hierarchical model-based policy optimization based on applying second-order methods in the space of all possible state-action paths. The resulting natural path gradient performs policy updates in a manner which is sensitive to the long-range correlational structure of the induced stationary state-action densities. We demonstrate that the natural path gradient can be computed exactly given an environment dynamics model and depends on expressions akin to higher-order successor representations. In simulation, we show that the priorization of local policy updates in the resulting policy flow indeed reflects the intuitive state-space hierarchy in several toy problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Simulation Techniques and Applications · Formal Methods in Verification
