Proximal Point Imitation Learning
Luca Viano, Angeliki Kamoutsi, Gergely Neu, Igor Krawczuk and, Volkan Cevher

TL;DR
This paper introduces new algorithms for infinite horizon imitation learning with linear function approximation, leveraging proximal-point methods to improve efficiency and provide theoretical guarantees for both online and offline settings.
Contribution
It develops a unified proximal-point based framework for imitation learning that avoids nested policy evaluations and offers rigorous efficiency guarantees.
Findings
Achieves theoretical efficiency guarantees for online IL without nested evaluations.
Provides offline IL algorithm with guarantees using dual smoothing and expert trajectories.
Demonstrates strong empirical performance with linear and neural network function approximation.
Abstract
This work develops new algorithms with rigorous efficiency guarantees for infinite horizon imitation learning (IL) with linear function approximation without restrictive coherence assumptions. We begin with the minimax formulation of the problem and then outline how to leverage classical tools from optimization, in particular, the proximal-point method (PPM) and dual smoothing, for online and offline IL, respectively. Thanks to PPM, we avoid nested policy evaluation and cost updates for online IL appearing in the prior literature. In particular, we do away with the conventional alternating updates by the optimization of a single convex and smooth objective over both cost and Q-functions. When solved inexactly, we relate the optimization errors to the suboptimality of the recovered policy. As an added bonus, by re-interpreting PPM as dual smoothing with the expert policy as a center…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Model Reduction and Neural Networks · Reinforcement Learning in Robotics
