Translating Videos to Commands for Robotic Manipulation with Deep   Recurrent Neural Networks

Anh Nguyen; Dimitrios Kanoulas; Luca Muratore; Darwin G. Caldwell,; Nikos G. Tsagarakis

arXiv:1710.00290·cs.RO·October 3, 2017

Translating Videos to Commands for Robotic Manipulation with Deep Recurrent Neural Networks

Anh Nguyen, Dimitrios Kanoulas, Luca Muratore, Darwin G. Caldwell,, Nikos G. Tsagarakis

PDF

TL;DR

This paper introduces a deep learning framework that translates videos into robotic manipulation commands, combining CNNs and RNNs to improve accuracy and enable real robot tasks.

Contribution

A novel deep RNN-based method for translating videos into commands, integrating CNN features and an encoder-decoder architecture for robotic manipulation.

Findings

01

Outperforms recent methods on a new challenging dataset

02

Smooth RNN transition improves translation accuracy

03

Successfully applied to a humanoid robot WALK-MAN

Abstract

We present a new method to translate videos to commands for robotic manipulation using Deep Recurrent Neural Networks (RNN). Our framework first extracts deep features from the input video frames with a deep Convolutional Neural Networks (CNN). Two RNN layers with an encoder-decoder architecture are then used to encode the visual features and sequentially generate the output words as the command. We demonstrate that the translation accuracy can be improved by allowing a smooth transaction between two RNN layers and using the state-of-the-art feature extractor. The experimental results on our new challenging dataset show that our approach outperforms recent methods by a fair margin. Furthermore, we combine the proposed translation module with the vision and planning system to let a robot perform various manipulation tasks. Finally, we demonstrate the effectiveness of our framework on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.