Episodic Self-Imitation Learning with Hindsight

Tianhong Dai; Hengyan Liu; Anil Anthony Bharath

arXiv:2011.13467·cs.AI·November 30, 2020·1 cites

Episodic Self-Imitation Learning with Hindsight

Tianhong Dai, Hengyan Liu, Anil Anthony Bharath

PDF

Open Access 1 Repo

TL;DR

This paper introduces episodic self-imitation learning with a trajectory selection module and adaptive loss, significantly improving reinforcement learning efficiency in continuous control tasks with sparse rewards.

Contribution

It proposes a novel episodic self-imitation algorithm that leverages entire episodes with hindsight and includes a selection module to filter uninformative samples, outperforming standard methods.

Findings

01

Outperforms baseline on-policy algorithms in experiments.

02

Achieves comparable results to state-of-the-art off-policy algorithms.

03

Effectively handles sparse reward problems in continuous control environments.

Abstract

Episodic self-imitation learning, a novel self-imitation algorithm with a trajectory selection module and an adaptive loss function, is proposed to speed up reinforcement learning. Compared to the original self-imitation learning algorithm, which samples good state-action pairs from the experience replay buffer, our agent leverages entire episodes with hindsight to aid self-imitation learning. A selection module is introduced to filter uninformative samples from each episode of the update. The proposed method overcomes the limitations of the standard self-imitation learning algorithm, a transitions-based method which performs poorly in handling continuous control environments with sparse rewards. From the experiments, episodic self-imitation learning is shown to perform better than baseline on-policy algorithms, achieving comparable performance to state-of-the-art off-policy algorithms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TianhongDai/esil-hindsight
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Zebrafish Biomedical Research Applications · Robot Manipulation and Learning

MethodsAdaptive Robust Loss · Experience Replay