Reinforcement and Imitation Learning via Interactive No-Regret Learning
Stephane Ross, J. Andrew Bagnell

TL;DR
This paper introduces a unified interactive no-regret learning framework for imitation and reinforcement learning, leveraging cost information to improve policy learning and providing theoretical insights into online policy iteration.
Contribution
It extends existing interactive imitation learning methods to incorporate cost information and applies the approach to reinforcement learning, unifying various techniques under a common theoretical framework.
Findings
Develops an interactive imitation learning method that uses cost information.
Extends the framework to reinforcement learning scenarios.
Provides theoretical support for online approximate policy iteration.
Abstract
Recent work has demonstrated that problems-- particularly imitation learning and structured prediction-- where a learner's predictions influence the input-distribution it is tested on can be naturally addressed by an interactive approach and analyzed using no-regret online learning. These approaches to imitation learning, however, neither require nor benefit from information about the cost of actions. We extend existing results in two directions: first, we develop an interactive imitation learning approach that leverages cost information; second, we extend the technique to address reinforcement learning. The results provide theoretical support to the commonly observed successes of online approximate policy iteration. Our approach suggests a broad new family of algorithms and provides a unifying view of existing techniques for imitation and reinforcement learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adaptive Dynamic Programming Control
