AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos
Laura Smith, Nikita Dhawan, Marvin Zhang, Pieter Abbeel, Sergey Levine

TL;DR
This paper introduces AVID, a framework that enables robots to learn multi-stage tasks from human videos by automatically translating them into robot videos, simplifying task specification and reducing human effort in reinforcement learning.
Contribution
AVID presents an automated method using pixel-level image translation to convert human demonstration videos into robot-compatible videos for autonomous multi-stage task learning.
Findings
Successfully learned complex tasks like operating a coffee machine
Minimal human input needed: 20 minutes for demonstrations
Total training time around 180 minutes of robot interaction
Abstract
Robotic reinforcement learning (RL) holds the promise of enabling robots to learn complex behaviors through experience. However, realizing this promise for long-horizon tasks in the real world requires mechanisms to reduce human burden in terms of defining the task and scaffolding the learning process. In this paper, we study how these challenges can be alleviated with an automated robotic learning framework, in which multi-stage tasks are defined simply by providing videos of a human demonstrator and then learned autonomously by the robot from raw image observations. A central challenge in imitating human videos is the difference in appearance between the human and robot, which typically requires manual correspondence. We instead take an automated approach and perform pixel-level image translation via CycleGAN to convert the human demonstration into a video of a robot, which can then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBatch Normalization · Residual Connection · PatchGAN · *Communicated@Fast*How Do I Communicate to Expedia? · Tanh Activation · Residual Block · Instance Normalization · Convolution · HuMan(Expedia)||How do I get a human at Expedia? · Sigmoid Activation
