One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

Tom Le Paine; Sergio G\'omez Colmenarejo; Ziyu Wang; Scott Reed; Yusuf; Aytar; Tobias Pfaff; Matt W. Hoffman; Gabriel Barth-Maron; Serkan Cabi; David; Budden; Nando de Freitas

arXiv:1810.05017·cs.LG·October 12, 2018·5 cites

One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

Tom Le Paine, Sergio G\'omez Colmenarejo, Ziyu Wang, Scott Reed, Yusuf, Aytar, Tobias Pfaff, Matt W. Hoffman, Gabriel Barth-Maron, Serkan Cabi, David, Budden, Nando de Freitas

PDF

Open Access

TL;DR

This paper presents MetaMimic, an off-policy reinforcement learning algorithm that enables large neural networks to perform high-fidelity one-shot imitation of diverse skills and improve task efficiency, even from visual inputs and sparse rewards.

Contribution

Introduces MetaMimic, the largest neural networks for deep RL, capable of one-shot imitation and task acceleration without demonstrator actions.

Findings

01

Larger networks with normalization are essential for high-fidelity imitation.

02

Policies can be learned from vision despite sparse rewards.

03

MetaMimic outperforms previous methods in imitation accuracy.

Abstract

Humans are experts at high-fidelity imitation -- closely mimicking a demonstration, often in one attempt. Humans use this ability to quickly solve a task instance, and to bootstrap learning of new tasks. Achieving these abilities in autonomous agents is an open problem. In this paper, we introduce an off-policy RL algorithm (MetaMimic) to narrow this gap. MetaMimic can learn both (i) policies for high-fidelity one-shot imitation of diverse novel skills, and (ii) policies that enable the agent to solve tasks more efficiently than the demonstrators. MetaMimic relies on the principle of storing all experiences in a memory and replaying these to learn massive deep neural network policies by off-policy RL. This paper introduces, to the best of our knowledge, the largest existing neural networks for deep RL and shows that larger networks with normalization are needed to achieve one-shot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning