SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Planning and Control
Arunkumar Byravan, Felix Leeb, Franziska Meier, Dieter Fox

TL;DR
This paper introduces a structured deep dynamics model called SE3-Pose-Nets for visuomotor control, which learns scene segmentation and pose prediction from point cloud data, enabling real-time control of robots in simulation and real-world settings.
Contribution
The work presents a novel structured deep dynamics model that explicitly segments scenes and predicts poses, improving control accuracy and efficiency over prior unstructured methods.
Findings
Achieves real-time control on Baxter robot from raw depth data
Outperforms baseline deep networks in scene prediction and control tasks
Successfully applies to both simulation and real-world scenarios
Abstract
In this work, we present an approach to deep visuomotor control using structured deep dynamics models. Our deep dynamics model, a variant of SE3-Nets, learns a low-dimensional pose embedding for visuomotor control via an encoder-decoder structure. Unlike prior work, our dynamics model is structured: given an input scene, our network explicitly learns to segment salient parts and predict their pose-embedding along with their motion modeled as a change in the pose space due to the applied actions. We train our model using a pair of point clouds separated by an action and show that given supervision only in the form of point-wise data associations between the frames our network is able to learn a meaningful segmentation of the scene along with consistent poses. We further show that our model can be used for closed-loop control directly in the learned low-dimensional pose space, where the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Robot Manipulation and Learning
