Exploring Gradient Explosion in Generative Adversarial Imitation   Learning: A Probabilistic Perspective

Wanying Wang; Yichen Zhu; Yirui Zhou; Chaomin Shen; Jian Tang; Zhiyuan; Xu; Yaxin Peng; Yangchun Zhang

arXiv:2312.11214·cs.LG·December 19, 2023·2 cites

Exploring Gradient Explosion in Generative Adversarial Imitation Learning: A Probabilistic Perspective

Wanying Wang, Yichen Zhu, Yirui Zhou, Chaomin Shen, Jian Tang, Zhiyuan, Xu, Yaxin Peng, Yangchun Zhang

PDF

Open Access 1 Video

TL;DR

This paper analyzes the causes of gradient explosion in GAIL, revealing that deterministic policies are prone to instability due to policy disparity, and proposes a reward clipping strategy to improve training stability and efficiency.

Contribution

It provides a probabilistic explanation for gradient explosion in DE-GAIL and introduces CREDO, a reward clipping method to enhance stability and data efficiency in GAIL training.

Findings

01

Gradient explosion is inevitable in DE-GAIL due to policy disparity.

02

ST-GAIL maintains stable training trajectories.

03

Reward clipping via CREDO mitigates gradient explosion and improves data efficiency.

Abstract

Generative Adversarial Imitation Learning (GAIL) stands as a cornerstone approach in imitation learning. This paper investigates the gradient explosion in two types of GAIL: GAIL with deterministic policy (DE-GAIL) and GAIL with stochastic policy (ST-GAIL). We begin with the observation that the training can be highly unstable for DE-GAIL at the beginning of the training phase and end up divergence. Conversely, the ST-GAIL training trajectory remains consistent, reliably converging. To shed light on these disparities, we provide an explanation from a theoretical perspective. By establishing a probabilistic lower bound for GAIL, we demonstrate that gradient explosion is an inevitable outcome for DE-GAIL due to occasionally large expert-imitator policy disparity, whereas ST-GAIL does not have the issue with it. To substantiate our assertion, we illustrate how modifications in the reward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Exploring Gradient Explosion in Generative Adversarial Imitation Learning: A Probabilistic Perspective· underline

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Model Reduction and Neural Networks · Adversarial Robustness in Machine Learning

MethodsGenerative Adversarial Imitation Learning