Provable Hierarchy-Based Meta-Reinforcement Learning

Kurtland Chua; Qi Lei; Jason D. Lee

arXiv:2110.09507·cs.LG·October 19, 2021

Provable Hierarchy-Based Meta-Reinforcement Learning

Kurtland Chua, Qi Lei, Jason D. Lee

PDF

Open Access

TL;DR

This paper introduces a provable method for hierarchy-based meta-reinforcement learning, enabling efficient learning of hierarchical structures during meta-training with theoretical guarantees, addressing limitations of prior heuristic approaches.

Contribution

It provides the first theoretical analysis of hierarchy learning in meta-RL, with guarantees for recovering natural hierarchies and bounds on downstream task performance.

Findings

01

Guarantees for hierarchy recovery under diversity conditions

02

Sample-efficient meta-training for hierarchical structures

03

Regret bounds for downstream task performance

Abstract

Hierarchical reinforcement learning (HRL) has seen widespread interest as an approach to tractable learning of complex modular behaviors. However, existing work either assume access to expert-constructed hierarchies, or use hierarchy-learning heuristics with no provable guarantees. To address this gap, we analyze HRL in the meta-RL setting, where a learner learns latent hierarchical structure during meta-training for use in a downstream task. We consider a tabular setting where natural hierarchical structure is embedded in the transition dynamics. Analogous to supervised meta-learning theory, we provide "diversity conditions" which, together with a tractable optimism-based algorithm, guarantee sample-efficient recovery of this natural hierarchy. Furthermore, we provide regret bounds on a learner using the recovered hierarchy to solve a meta-test task. Our bounds incorporate common…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Autonomous Vehicle Technology and Safety