Minimizing Human Assistance: Augmenting a Single Demonstration for Deep Reinforcement Learning
Abraham George, Alison Bartsch, and Amir Barati Farimani

TL;DR
This paper introduces a method to reduce human involvement in reinforcement learning by augmenting a single human demonstration to improve training efficiency and enable solving complex tasks, using minimal human input.
Contribution
The authors propose a novel demonstration augmentation technique that enhances RL training with only one human example, significantly reducing human effort while maintaining performance benefits.
Findings
Augmentation with a single demonstration improves training speed.
Method enables solving complex tasks like block stacking.
Agent often learns policies different from the human demonstration.
Abstract
The use of human demonstrations in reinforcement learning has proven to significantly improve agent performance. However, any requirement for a human to manually 'teach' the model is somewhat antithetical to the goals of reinforcement learning. This paper attempts to minimize human involvement in the learning process while retaining the performance advantages by using a single human example collected through a simple-to-use virtual reality simulation to assist with RL training. Our method augments a single demonstration to generate numerous human-like demonstrations that, when combined with Deep Deterministic Policy Gradients and Hindsight Experience Replay (DDPG + HER) significantly improve training time on simple tasks and allows the agent to solve a complex task (block stacking) that DDPG + HER alone cannot solve. The model achieves this significant training advantage using a single…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsWeight Decay · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Adam · Dense Connections · Experience Replay · Deep Deterministic Policy Gradient
