Exploration-efficient Deep Reinforcement Learning with Demonstration Guidance for Robot Control
Ke Lin, Liang Gong, Xudong Li, Te Sun, Binhao Chen, Chengliang Liu,, Zhengfeng Zhang, Jian Pu, Junping Zhang

TL;DR
This paper introduces a sample-efficient deep reinforcement learning method that leverages demonstration guidance to improve training stability and exploration in continuous control tasks, reducing the need for extensive demonstrations.
Contribution
It proposes the DRL-EG algorithm, combining a discriminator and guider modeled from few demonstrations to enhance exploration and sample efficiency in DRL.
Findings
Outperforms other RL and RLfD methods in continuous control tasks.
Helps agents escape local optima during training.
Requires fewer demonstrations than existing RLfD approaches.
Abstract
Although deep reinforcement learning (DRL) algorithms have made important achievements in many control tasks, they still suffer from the problems of sample inefficiency and unstable training process, which are usually caused by sparse rewards. Recently, some reinforcement learning from demonstration (RLfD) methods have shown to be promising in overcoming these problems. However, they usually require considerable demonstrations. In order to tackle these challenges, on the basis of the SAC algorithm we propose a sample efficient DRL-EG (DRL with efficient guidance) algorithm, in which a discriminator D(s) and a guider G(s) are modeled by a small number of expert demonstrations. The discriminator will determine the appropriate guidance states and the guider will guide agents to better exploration in the training phase. Empirical evaluation results from several continuous control tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adaptive Dynamic Programming Control
