Learning Self-Imitating Diverse Policies
Tanmay Gangwani, Qiang Liu, Jian Peng

TL;DR
This paper introduces a self-imitation learning algorithm that improves reinforcement learning in sparse and episodic reward environments by leveraging divergence minimization and diverse policy learning, leading to better sample efficiency and performance.
Contribution
The paper proposes a novel self-imitation learning method using Jensen-Shannon divergence and Stein variational policy gradients to enhance exploration and diversity in sparse reward settings.
Findings
Performs comparably to existing algorithms with dense rewards.
Significantly outperforms in sparse and episodic reward environments.
Effective in continuous-control MuJoCo tasks with diverse policies.
Abstract
The success of popular algorithms for deep reinforcement learning, such as policy-gradients and Q-learning, relies heavily on the availability of an informative reward signal at each timestep of the sequential decision-making process. When rewards are only sparsely available during an episode, or a rewarding feedback is provided only after episode termination, these algorithms perform sub-optimally due to the difficultly in credit assignment. Alternatively, trajectory-based policy optimization methods, such as cross-entropy method and evolution strategies, do not require per-timestep rewards, but have been found to suffer from high sample complexity by completing forgoing the temporal nature of the problem. Improving the efficiency of RL algorithms in real-world problems with sparse or episodic rewards is therefore a pressing need. In this work, we introduce a self-imitation learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Model Reduction and Neural Networks
