Self-Imitation Learning for Robot Tasks with Sparse and Delayed Rewards
Zhixin Chen, Mengxiang Lin

TL;DR
This paper introduces Self-Imitation Learning with Constant Reward (SILCR), a method that guides robotic control in environments with sparse and delayed rewards by assigning constant immediate rewards based on final episodic rewards, improving performance and stability.
Contribution
The paper presents a novel self-imitation learning approach that does not require environment-provided immediate rewards, enabling effective learning in sparse and delayed reward settings.
Findings
Significantly outperforms alternative methods in MuJoCo tasks with sparse rewards.
Achieves competitive performance even with dense rewards available.
Demonstrates stability and reproducibility through ablation experiments.
Abstract
The application of reinforcement learning (RL) in robotic control is still limited in the environments with sparse and delayed rewards. In this paper, we propose a practical self-imitation learning method named Self-Imitation Learning with Constant Reward (SILCR). Instead of requiring hand-defined immediate rewards from environments, our method assigns the immediate rewards at each timestep with constant values according to their final episodic rewards. In this way, even if the dense rewards from environments are unavailable, every action taken by the agents would be guided properly. We demonstrate the effectiveness of our method in some challenging continuous robotics control tasks in MuJoCo simulation and the results show that our method significantly outperforms the alternative methods in tasks with sparse and delayed rewards. Even compared with alternatives with dense rewards…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Robotic Locomotion and Control
