Mixing Human Demonstrations with Self-Exploration in Experience Replay for Deep Reinforcement Learning
Dylan Klein, Akansel Cosgun

TL;DR
This paper explores integrating human demonstration data with self-exploration in experience replay buffers for deep reinforcement learning, showing that combining both can improve convergence speed and efficiency.
Contribution
It introduces a method to incorporate human demonstrations into experience replay and analyzes the impact of different demonstration ratios on learning efficiency.
Findings
Pure demonstration leads to faster convergence.
Combining demonstrations with self-exploration improves learning efficiency.
Agents with mixed data achieve similar success rates as pure methods.
Abstract
We investigate the effect of using human demonstration data in the replay buffer for Deep Reinforcement Learning. We use a policy gradient method with a modified experience replay buffer where a human demonstration experience is sampled with a given probability. We analyze different ratios of using demonstration data in a task where an agent attempts to reach a goal while avoiding obstacles. Our results suggest that while the agents trained by pure self-exploration and pure demonstration had similar success rates, the pure demonstration model converged faster to solutions with less number of steps.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Evolutionary Algorithms and Applications
MethodsExperience Replay
