Self-Imitation Learning

Junhyuk Oh; Yijie Guo; Satinder Singh; Honglak Lee

arXiv:1806.05635·cs.LG·June 15, 2018·70 cites

Self-Imitation Learning

Junhyuk Oh, Yijie Guo, Satinder Singh, Honglak Lee

PDF

Open Access 4 Repos

TL;DR

Self-Imitation Learning (SIL) is an off-policy algorithm that leverages past successful decisions to enhance exploration and improve performance in challenging reinforcement learning environments.

Contribution

The paper introduces SIL, a novel off-policy actor-critic method that exploits past good experiences to drive exploration and improve learning efficiency.

Findings

01

SIL significantly improves A2C performance on hard Atari games.

02

SIL is competitive with state-of-the-art exploration methods.

03

SIL enhances PPO performance on MuJoCo tasks.

Abstract

This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to reproduce the agent's past good decisions. This algorithm is designed to verify our hypothesis that exploiting past good experiences can indirectly drive deep exploration. Our empirical results show that SIL significantly improves advantage actor-critic (A2C) on several hard exploration Atari games and is competitive to the state-of-the-art count-based exploration methods. We also show that SIL improves proximal policy optimization (PPO) on MuJoCo tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInnovative Teaching and Learning Methods