Exploring Gradient Explosion in Generative Adversarial Imitation Learning: A Probabilistic Perspective
Wanying Wang, Yichen Zhu, Yirui Zhou, Chaomin Shen, Jian Tang, Zhiyuan, Xu, Yaxin Peng, Yangchun Zhang

TL;DR
This paper analyzes the causes of gradient explosion in GAIL, revealing that deterministic policies are prone to instability due to policy disparity, and proposes a reward clipping strategy to improve training stability and efficiency.
Contribution
It provides a probabilistic explanation for gradient explosion in DE-GAIL and introduces CREDO, a reward clipping method to enhance stability and data efficiency in GAIL training.
Findings
Gradient explosion is inevitable in DE-GAIL due to policy disparity.
ST-GAIL maintains stable training trajectories.
Reward clipping via CREDO mitigates gradient explosion and improves data efficiency.
Abstract
Generative Adversarial Imitation Learning (GAIL) stands as a cornerstone approach in imitation learning. This paper investigates the gradient explosion in two types of GAIL: GAIL with deterministic policy (DE-GAIL) and GAIL with stochastic policy (ST-GAIL). We begin with the observation that the training can be highly unstable for DE-GAIL at the beginning of the training phase and end up divergence. Conversely, the ST-GAIL training trajectory remains consistent, reliably converging. To shed light on these disparities, we provide an explanation from a theoretical perspective. By establishing a probabilistic lower bound for GAIL, we demonstrate that gradient explosion is an inevitable outcome for DE-GAIL due to occasionally large expert-imitator policy disparity, whereas ST-GAIL does not have the issue with it. To substantiate our assertion, we illustrate how modifications in the reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Model Reduction and Neural Networks · Adversarial Robustness in Machine Learning
MethodsGenerative Adversarial Imitation Learning
