Rethinking ValueDice: Does It Really Improve Performance?
Ziniu Li, Tian Xu, Yang Yu, Zhi-Quan Luo

TL;DR
This paper critically examines ValueDice, revealing that its offline success is due to overfitting and regularization, and that its advantages diminish with incomplete expert trajectories, challenging previous assumptions about its performance.
Contribution
The paper provides a detailed analysis showing that ValueDice's improvements are largely due to regularization effects and that it is closely related to Behavioral Cloning under certain conditions.
Findings
ValueDice reduces to Behavioral Cloning in offline settings.
Regularization, such as weight decay, improves offline imitation performance.
ValueDice fails with subsampled expert trajectories, limiting its practical effectiveness.
Abstract
Since the introduction of GAIL, adversarial imitation learning (AIL) methods attract lots of research interests. Among these methods, ValueDice has achieved significant improvements: it beats the classical approach Behavioral Cloning (BC) under the offline setting, and it requires fewer interactions than GAIL under the online setting. Are these improvements benefited from more advanced algorithm designs? We answer this question by the following conclusions. First, we show that ValueDice could reduce to BC under the offline setting. Second, we verify that overfitting exists and regularization matters in the low-data regime. Specifically, we demonstrate that with weight decay, BC also nearly matches the expert performance as ValueDice does. The first two claims explain the superior offline performance of ValueDice. Third, we establish that ValueDice does not work when the expert…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Explainable Artificial Intelligence (XAI)
MethodsGenerative Adversarial Imitation Learning
