Provably Efficient Off-Policy Adversarial Imitation Learning with   Convergence Guarantees

Yilei Chen; Vittorio Giammarino; James Queeney; Ioannis Ch.; Paschalidis

arXiv:2405.16668·cs.LG·May 28, 2024

Provably Efficient Off-Policy Adversarial Imitation Learning with Convergence Guarantees

Yilei Chen, Vittorio Giammarino, James Queeney, Ioannis Ch., Paschalidis

PDF

Open Access

TL;DR

This paper provides the first theoretical guarantees for off-policy Adversarial Imitation Learning, demonstrating its convergence and sample efficiency by reusing recent policy samples without importance sampling corrections.

Contribution

It establishes convergence guarantees and analyzes sample complexity for off-policy AIL algorithms, showing their theoretical soundness and efficiency.

Findings

01

Reusing recent policy samples maintains convergence guarantees.

02

Off-policy updates' distribution shift error is outweighed by increased data benefits.

03

First theoretical analysis of off-policy AIL algorithms.

Abstract

Adversarial Imitation Learning (AIL) faces challenges with sample inefficiency because of its reliance on sufficient on-policy data to evaluate the performance of the current policy during reward function updates. In this work, we study the convergence properties and sample complexity of off-policy AIL algorithms. We show that, even in the absence of importance sampling correction, reusing samples generated by the $o (K)$ most recent policies, where $K$ is the number of iterations of policy updates and reward updates, does not undermine the convergence guarantees of this class of algorithms. Furthermore, our results indicate that the distribution shift error induced by off-policy updates is dominated by the benefits of having more data available. This result provides theoretical support for the sample efficiency of off-policy AIL algorithms. To the best of our knowledge, this is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning