Episodic Self-Imitation Learning with Hindsight
Tianhong Dai, Hengyan Liu, Anil Anthony Bharath

TL;DR
This paper introduces episodic self-imitation learning with a trajectory selection module and adaptive loss, significantly improving reinforcement learning efficiency in continuous control tasks with sparse rewards.
Contribution
It proposes a novel episodic self-imitation algorithm that leverages entire episodes with hindsight and includes a selection module to filter uninformative samples, outperforming standard methods.
Findings
Outperforms baseline on-policy algorithms in experiments.
Achieves comparable results to state-of-the-art off-policy algorithms.
Effectively handles sparse reward problems in continuous control environments.
Abstract
Episodic self-imitation learning, a novel self-imitation algorithm with a trajectory selection module and an adaptive loss function, is proposed to speed up reinforcement learning. Compared to the original self-imitation learning algorithm, which samples good state-action pairs from the experience replay buffer, our agent leverages entire episodes with hindsight to aid self-imitation learning. A selection module is introduced to filter uninformative samples from each episode of the update. The proposed method overcomes the limitations of the standard self-imitation learning algorithm, a transitions-based method which performs poorly in handling continuous control environments with sparse rewards. From the experiments, episodic self-imitation learning is shown to perform better than baseline on-policy algorithms, achieving comparable performance to state-of-the-art off-policy algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Zebrafish Biomedical Research Applications · Robot Manipulation and Learning
MethodsAdaptive Robust Loss · Experience Replay
