TL;DR
This paper introduces WDAIL, a novel adversarial imitation learning algorithm that uses Wasserstein distance and reward shape exploration to improve stability and performance in complex continuous control tasks.
Contribution
The paper proposes a new IL method combining Wasserstein distance, PPO, and reward shape exploration, addressing limitations of JS divergence-based rewards in GAIL.
Findings
WDAIL achieves stable learning in complex tasks.
The method outperforms existing GAIL variants in MuJoCo environments.
Reward shape exploration enhances task-specific performance.
Abstract
The generative adversarial imitation learning (GAIL) has provided an adversarial learning framework for imitating expert policy from demonstrations in high-dimensional continuous tasks. However, almost all GAIL and its extensions only design a kind of reward function of logarithmic form in the adversarial training strategy with the Jensen-Shannon (JS) divergence for all complex environments. The fixed logarithmic type of reward function may be difficult to solve all complex tasks, and the vanishing gradients problem caused by the JS divergence will harm the adversarial learning process. In this paper, we propose a new algorithm named Wasserstein Distance guided Adversarial Imitation Learning (WDAIL) for promoting the performance of imitation learning (IL). There are three improvements in our method: (a) introducing the Wasserstein distance to obtain more appropriate measure in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGenerative Adversarial Imitation Learning
