Support-weighted Adversarial Imitation Learning
Ruohan Wang, Carlo Ciliberto, Pierluigi Amadori, Yiannis Demiris

TL;DR
Support-weighted Adversarial Imitation Learning (SAIL) enhances existing AIL methods by incorporating support estimation to improve reward quality, leading to better performance and stability in imitation learning tasks.
Contribution
SAIL introduces a support estimation-based weighting scheme to improve adversarial reward signals in AIL, enhancing training stability and performance.
Findings
SAIL outperforms baseline methods on benchmark control tasks.
SAIL achieves more stable training compared to traditional AIL.
SAIL is at least as efficient as the underlying AIL algorithm.
Abstract
Adversarial Imitation Learning (AIL) is a broad family of imitation learning methods designed to mimic expert behaviors from demonstrations. While AIL has shown state-of-the-art performance on imitation learning with only small number of demonstrations, it faces several practical challenges such as potential training instability and implicit reward bias. To address the challenges, we propose Support-weighted Adversarial Imitation Learning (SAIL), a general framework that extends a given AIL algorithm with information derived from support estimation of the expert policies. SAIL improves the quality of the reinforcement signals by weighing the adversarial reward with a confidence score from support estimation of the expert policy. We also show that SAIL is always at least as efficient as the underlying AIL algorithm that SAIL uses for learning the adversarial reward. Empirically, we show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Model Reduction and Neural Networks
MethodsGenerative Adversarial Imitation Learning
