Match or Replay: Self Imitating Proximal Policy Optimization

Gaurav Chaudhary; Laxmidhar Behera; Washim Uddin Mondal

arXiv:2603.27515·cs.LG·March 31, 2026

Match or Replay: Self Imitating Proximal Policy Optimization

Gaurav Chaudhary, Laxmidhar Behera, Washim Uddin Mondal

PDF

TL;DR

This paper introduces a self-imitating on-policy RL algorithm that improves exploration and sample efficiency by leveraging past successful experiences, demonstrating faster learning and higher success rates in various environments.

Contribution

The paper proposes a novel self-imitating RL method using optimal transport and trajectory replay to enhance exploration and efficiency in both dense and sparse reward settings.

Findings

01

Significant improvements in learning speed across tested environments.

02

Higher success rates compared to existing self-imitating RL methods.

03

Effective in both dense and sparse reward scenarios.

Abstract

Reinforcement Learning (RL) agents often struggle with inefficient exploration, particularly in environments with sparse rewards. Traditional exploration strategies can lead to slow learning and suboptimal performance because agents fail to systematically build on previously successful experiences, thereby reducing sample efficiency. To tackle this issue, we propose a self-imitating on-policy algorithm that enhances exploration and sample efficiency by leveraging past high-reward state-action pairs to guide policy updates. Our method incorporates self-imitation by using optimal transport distance in dense reward environments to prioritize state visitation distributions that match the most rewarding trajectory. In sparse-reward environments, we uniformly replay successful self-encountered trajectories to facilitate structured exploration. Experimental results across diverse environments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.