TL;DR
This paper demonstrates that enforcing local Lipschitz continuity on the reward function is crucial for effective off-policy generative adversarial imitation learning, providing theoretical guarantees and empirical evidence of improved performance and robustness.
Contribution
It introduces the necessity of Lipschitzness in reward functions for imitation learning, offers theoretical analysis, and proposes reward preconditioning methods to enhance robustness.
Findings
Lipschitzness is essential for good imitation performance.
Enforcing Lipschitz constraints improves robustness and theoretical guarantees.
Reward preconditioning enhances method stability across various settings.
Abstract
Despite the recent success of reinforcement learning in various domains, these approaches remain, for the most part, deterringly sensitive to hyper-parameters and are often riddled with essential engineering feats allowing their success. We consider the case of off-policy generative adversarial imitation learning, and perform an in-depth review, qualitative and quantitative, of the method. We show that forcing the learned reward function to be local Lipschitz-continuous is a sine qua non condition for the method to perform well. We then study the effects of this necessary condition and provide several theoretical results involving the local Lipschitzness of the state-value function. We complement these guarantees with empirical evidence attesting to the strong positive effect that the consistent satisfaction of the Lipschitzness constraint on the reward has on imitation performance.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
