Model-based Behavioral Cloning with Future Image Similarity Learning
Alan Wu, AJ Piergiovanni, Michael S. Ryoo

TL;DR
This paper introduces a visual imitation learning framework that enables robots to learn action policies solely from expert videos by predicting future scenes and matching them to expert images, eliminating the need for robot trials.
Contribution
The paper proposes a novel approach combining future scene prediction with image similarity for robot imitation learning, trained solely on expert trajectories without real-world robot exploration.
Findings
Effective in simulated environments for robot navigation.
Comparable or superior to baseline methods in real-world tests.
Reduces the need for costly robot trials.
Abstract
We present a visual imitation learning framework that enables learning of robot action policies solely based on expert samples without any robot trials. Robot exploration and on-policy trials in a real-world environment could often be expensive/dangerous. We present a new approach to address this problem by learning a future scene prediction model solely on a collection of expert trajectories consisting of unlabeled example videos and actions, and by enabling generalized action cloning using future image similarity. The robot learns to visually predict the consequences of taking an action, and obtains the policy by evaluating how similar the predicted future image is to an expert image. We develop a stochastic action-conditioned convolutional autoencoder, and present how we take advantage of future images for robot learning. We conduct experiments in simulated and real-life environments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Human Pose and Action Recognition
