Provably Efficient Off-Policy Adversarial Imitation Learning with Convergence Guarantees
Yilei Chen, Vittorio Giammarino, James Queeney, Ioannis Ch., Paschalidis

TL;DR
This paper provides the first theoretical guarantees for off-policy Adversarial Imitation Learning, demonstrating its convergence and sample efficiency by reusing recent policy samples without importance sampling corrections.
Contribution
It establishes convergence guarantees and analyzes sample complexity for off-policy AIL algorithms, showing their theoretical soundness and efficiency.
Findings
Reusing recent policy samples maintains convergence guarantees.
Off-policy updates' distribution shift error is outweighed by increased data benefits.
First theoretical analysis of off-policy AIL algorithms.
Abstract
Adversarial Imitation Learning (AIL) faces challenges with sample inefficiency because of its reliance on sufficient on-policy data to evaluate the performance of the current policy during reward function updates. In this work, we study the convergence properties and sample complexity of off-policy AIL algorithms. We show that, even in the absence of importance sampling correction, reusing samples generated by the most recent policies, where is the number of iterations of policy updates and reward updates, does not undermine the convergence guarantees of this class of algorithms. Furthermore, our results indicate that the distribution shift error induced by off-policy updates is dominated by the benefits of having more data available. This result provides theoretical support for the sample efficiency of off-policy AIL algorithms. To the best of our knowledge, this is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
