Self-Imitation Learning
Junhyuk Oh, Yijie Guo, Satinder Singh, Honglak Lee

TL;DR
Self-Imitation Learning (SIL) is an off-policy algorithm that leverages past successful decisions to enhance exploration and improve performance in challenging reinforcement learning environments.
Contribution
The paper introduces SIL, a novel off-policy actor-critic method that exploits past good experiences to drive exploration and improve learning efficiency.
Findings
SIL significantly improves A2C performance on hard Atari games.
SIL is competitive with state-of-the-art exploration methods.
SIL enhances PPO performance on MuJoCo tasks.
Abstract
This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to reproduce the agent's past good decisions. This algorithm is designed to verify our hypothesis that exploiting past good experiences can indirectly drive deep exploration. Our empirical results show that SIL significantly improves advantage actor-critic (A2C) on several hard exploration Atari games and is competitive to the state-of-the-art count-based exploration methods. We also show that SIL improves proximal policy optimization (PPO) on MuJoCo tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Teaching and Learning Methods
