DMotion: Robotic Visuomotor Control with Unsupervised Forward Model   Learned from Videos

Haoqi Yuan; Ruihai Wu; Andrew Zhao; Haipeng Zhang; Zihan Ding; Hao; Dong

arXiv:2103.04301·cs.RO·July 27, 2021

DMotion: Robotic Visuomotor Control with Unsupervised Forward Model Learned from Videos

Haoqi Yuan, Ruihai Wu, Andrew Zhao, Haipeng Zhang, Zihan Ding, Hao, Dong

PDF

Open Access 1 Repo

TL;DR

DMotion introduces an unsupervised video-based approach for robotic visuomotor control by learning a forward model that disentangles controllable agent motion, enabling effective model predictive control without labeled data.

Contribution

The paper presents DMotion, a novel method that learns an environment forward model solely from videos, using end-to-end training with disentangled agent motion and physical interpretable transformations.

Findings

01

Achieves superior forward model accuracy in Grid World and robotic simulation environments.

02

Demonstrates effective robotic manipulation using learned models in model predictive control.

03

Operates without requiring labeled actions or object annotations.

Abstract

Learning an accurate model of the environment is essential for model-based control tasks. Existing methods in robotic visuomotor control usually learn from data with heavily labelled actions, object entities or locations, which can be demanding in many cases. To cope with this limitation, we propose a method, dubbed DMotion, that trains a forward model from video data only, via disentangling the motion of controllable agent to model the transition dynamics. An object extractor and an interaction learner are trained in an end-to-end manner without supervision. The agent's motions are explicitly represented using spatial transformation matrices containing physical meanings. In the experiments, DMotion achieves superior performance on learning an accurate forward model in a Grid World environment, as well as a more realistic robot control environment in simulation. With the accurate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hyperplane-lab/dmotion-code
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Multimodal Machine Learning Applications