Accelerating Self-Imitation Learning from Demonstrations via Policy Constraints and Q-Ensemble
Chao Li

TL;DR
This paper introduces A-SILfD, a reinforcement learning method that leverages expert demonstrations and ensemble Q-functions to improve sample efficiency and robustness in continuous control tasks, even with imperfect data.
Contribution
A-SILfD is a novel approach that treats demonstrations as successful experiences and uses Q-ensemble techniques to prevent performance degradation.
Findings
Significantly improves sample efficiency with few demonstrations.
Outperforms baseline methods after 150,000 training steps.
Remains robust against imperfect expert demonstrations.
Abstract
Deep reinforcement learning (DRL) provides a new way to generate robot control policy. However, the process of training control policy requires lengthy exploration, resulting in a low sample efficiency of reinforcement learning (RL) in real-world tasks. Both imitation learning (IL) and learning from demonstrations (LfD) improve the training process by using expert demonstrations, but imperfect expert demonstrations can mislead policy improvement. Offline to Online reinforcement learning requires a lot of offline data to initialize the policy, and distribution shift can easily lead to performance degradation during online fine-tuning. To solve the above problems, we propose a learning from demonstrations method named A-SILfD, which treats expert demonstrations as the agent's successful experiences and uses experiences to constrain policy improvement. Furthermore, we prevent performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning
