Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation
YuXuan Liu, Abhishek Gupta, Pieter Abbeel, Sergey Levine

TL;DR
This paper introduces a novel imitation-from-observation method that enables robots to learn diverse tasks by observing raw videos of human demonstrations, overcoming viewpoint and environment variations.
Contribution
It proposes a new approach combining video prediction with context translation and deep reinforcement learning for imitation from raw videos, expanding imitation learning capabilities.
Findings
Effective in real-world robotic tasks like sweeping and ladling
Handles viewpoint and environment changes in demonstrations
Works in both real-world and simulated scenarios
Abstract
Imitation learning is an effective approach for autonomous systems to acquire control policies when an explicit reward function is unavailable, using supervision provided as demonstrations from an expert, typically a human operator. However, standard imitation learning methods assume that the agent receives examples of observation-action tuples that could be provided, for instance, to a supervised learning algorithm. This stands in contrast to how humans and animals imitate: we observe another person performing some behavior and then figure out which actions will realize that behavior, compensating for changes in viewpoint, surroundings, object positions and types, and other factors. We term this kind of imitation learning "imitation-from-observation," and propose an imitation learning method based on video prediction with context translation and deep reinforcement learning. This lifts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Human Pose and Action Recognition
