Mixing Human Demonstrations with Self-Exploration in Experience Replay   for Deep Reinforcement Learning

Dylan Klein; Akansel Cosgun

arXiv:2107.06840·cs.AI·July 15, 2021

Mixing Human Demonstrations with Self-Exploration in Experience Replay for Deep Reinforcement Learning

Dylan Klein, Akansel Cosgun

PDF

Open Access

TL;DR

This paper explores integrating human demonstration data with self-exploration in experience replay buffers for deep reinforcement learning, showing that combining both can improve convergence speed and efficiency.

Contribution

It introduces a method to incorporate human demonstrations into experience replay and analyzes the impact of different demonstration ratios on learning efficiency.

Findings

01

Pure demonstration leads to faster convergence.

02

Combining demonstrations with self-exploration improves learning efficiency.

03

Agents with mixed data achieve similar success rates as pure methods.

Abstract

We investigate the effect of using human demonstration data in the replay buffer for Deep Reinforcement Learning. We use a policy gradient method with a modified experience replay buffer where a human demonstration experience is sampled with a given probability. We analyze different ratios of using demonstration data in a task where an agent attempts to reach a goal while avoiding obstacles. Our results suggest that while the agents trained by pure self-exploration and pure demonstration had similar success rates, the pure demonstration model converged faster to solutions with less number of steps.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Evolutionary Algorithms and Applications

MethodsExperience Replay