Hierarchical Imitation and Reinforcement Learning

Hoang M. Le; Nan Jiang; Alekh Agarwal; Miroslav Dud\'ik; Yisong Yue,; Hal Daum\'e III

arXiv:1803.00590·cs.LG·June 12, 2018·27 cites

Hierarchical Imitation and Reinforcement Learning

Hoang M. Le, Nan Jiang, Alekh Agarwal, Miroslav Dud\'ik, Yisong Yue,, Hal Daum\'e III

PDF

Open Access

TL;DR

This paper introduces a hierarchical guidance framework that combines imitation learning and reinforcement learning to improve efficiency in long-horizon, sparse reward decision tasks, reducing expert effort and exploration costs.

Contribution

The paper presents a novel hierarchical guidance framework that effectively integrates IL and RL at different levels, enhancing learning speed and label efficiency in complex tasks.

Findings

01

Faster learning on long-horizon benchmarks like Montezuma's Revenge

02

Significant reduction in expert labeling effort

03

More label-efficient than standard imitation learning

Abstract

We study how to effectively leverage expert feedback to learn sequential decision-making policies. We focus on problems with sparse rewards and long time horizons, which typically pose significant challenges in reinforcement learning. We propose an algorithmic framework, called hierarchical guidance, that leverages the hierarchical structure of the underlying problem to integrate different modes of expert interaction. Our framework can incorporate different combinations of imitation learning (IL) and reinforcement learning (RL) at different levels, leading to dramatic reductions in both expert effort and cost of exploration. Using long-horizon benchmarks, including Montezuma's Revenge, we demonstrate that our approach can learn significantly faster than hierarchical RL, and be significantly more label-efficient than standard IL. We also theoretically analyze labeling cost for certain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing