Generative Adversarial Self-Imitation Learning
Yijie Guo, Junhyuk Oh, Satinder Singh, Honglak Lee

TL;DR
GASIL introduces a regularizer for reinforcement learning that encourages agents to imitate successful past trajectories using a generative adversarial framework, improving performance in environments with sparse and delayed rewards.
Contribution
It proposes GASIL, a novel method combining imitation learning with adversarial training to enhance reinforcement learning in challenging environments.
Findings
GASIL improves PPO performance in 2D Point Mass environments.
GASIL enhances learning in MuJoCo tasks with delayed rewards.
The method effectively handles stochastic dynamics.
Abstract
This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages the agent to imitate past good trajectories via generative adversarial imitation learning framework. Instead of directly maximizing rewards, GASIL focuses on reproducing past good trajectories, which can potentially make long-term credit assignment easier when rewards are sparse and delayed. GASIL can be easily combined with any policy gradient objective by using GASIL as a learned shaped reward function. Our experimental results show that GASIL improves the performance of proximal policy optimization on 2D Point Mass and MuJoCo environments with delayed reward and stochastic dynamics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
