Learning Robot Activities from First-Person Human Videos Using   Convolutional Future Regression

Jangwon Lee; Michael S. Ryoo

arXiv:1703.01040·cs.RO·July 25, 2017·5 cites

Learning Robot Activities from First-Person Human Videos Using Convolutional Future Regression

Jangwon Lee, Michael S. Ryoo

PDF

Open Access

TL;DR

This paper presents a deep learning approach enabling robots to learn new activities from unlabeled first-person human videos by predicting future scene states and transferring this knowledge for real-time robot execution.

Contribution

It introduces a novel convolutional future regression model that predicts future hand and object locations from first-person videos, facilitating robot activity learning without labeled data.

Findings

01

Robots can learn activities from unlabeled videos.

02

The model accurately predicts future hand and object positions.

03

Robots execute learned activities in real-time based on camera input.

Abstract

We design a new approach that allows robot learning of new activities from unlabeled human example videos. Given videos of humans executing the same activity from a human's viewpoint (i.e., first-person videos), our objective is to make the robot learn the temporal structure of the activity as its future regression network, and learn to transfer such model for its own motor execution. We present a new deep learning model: We extend the state-of-the-art convolutional object detection network for the representation/estimation of human hands in training videos, and newly introduce the concept of using a fully convolutional network to regress (i.e., predict) the intermediate scene representation corresponding to the future frame (e.g., 1-2 seconds later). Combining these allows direct prediction of future locations of human hands and objects, which enables the robot to infer the motor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Multimodal Machine Learning Applications