AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human   Videos

Laura Smith; Nikita Dhawan; Marvin Zhang; Pieter Abbeel; Sergey Levine

arXiv:1912.04443·cs.RO·June 23, 2020

AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos

Laura Smith, Nikita Dhawan, Marvin Zhang, Pieter Abbeel, Sergey Levine

PDF

TL;DR

This paper introduces AVID, a framework that enables robots to learn multi-stage tasks from human videos by automatically translating them into robot videos, simplifying task specification and reducing human effort in reinforcement learning.

Contribution

AVID presents an automated method using pixel-level image translation to convert human demonstration videos into robot-compatible videos for autonomous multi-stage task learning.

Findings

01

Successfully learned complex tasks like operating a coffee machine

02

Minimal human input needed: 20 minutes for demonstrations

03

Total training time around 180 minutes of robot interaction

Abstract

Robotic reinforcement learning (RL) holds the promise of enabling robots to learn complex behaviors through experience. However, realizing this promise for long-horizon tasks in the real world requires mechanisms to reduce human burden in terms of defining the task and scaffolding the learning process. In this paper, we study how these challenges can be alleviated with an automated robotic learning framework, in which multi-stage tasks are defined simply by providing videos of a human demonstrator and then learned autonomously by the robot from raw image observations. A central challenge in imitating human videos is the difference in appearance between the human and robot, which typically requires manual correspondence. We instead take an automated approach and perform pixel-level image translation via CycleGAN to convert the human demonstration into a video of a robot, which can then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsBatch Normalization · Residual Connection · PatchGAN · *Communicated@Fast*How Do I Communicate to Expedia? · Tanh Activation · Residual Block · Instance Normalization · Convolution · HuMan(Expedia)||How do I get a human at Expedia? · Sigmoid Activation