Multi-Modal Learning of Keypoint Predictive Models for Visual Object   Manipulation

Sarah Bechtle; Neha Das; Franziska Meier

arXiv:2011.03882·cs.RO·June 28, 2021

Multi-Modal Learning of Keypoint Predictive Models for Visual Object Manipulation

Sarah Bechtle, Neha Das, Franziska Meier

PDF

Open Access

TL;DR

This paper introduces a self-supervised multi-modal approach for robots to learn visual keypoints and extend their kinematic models during object manipulation, enhancing generalization and manipulation capabilities.

Contribution

It presents a novel autoencoder-based multi-modal keypoint detector and a method to extend robot kinematics using visual keypoints, enabling better manipulation in new environments.

Findings

01

The approach accurately predicts visual keypoints on grasped objects.

02

It successfully extends the robot's kinematic chain with minimal visual data.

03

The extended kinematic model improves object placement tasks in simulation and hardware.

Abstract

Humans have impressive generalization capabilities when it comes to manipulating objects and tools in completely novel environments. These capabilities are, at least partially, a result of humans having internal models of their bodies and any grasped object. How to learn such body schemas for robots remains an open problem. In this work, we develop an self-supervised approach that can extend a robot's kinematic model when grasping an object from visual latent representations. Our framework comprises two components: (1) we present a multi-modal keypoint detector: an autoencoder architecture trained by fusing proprioception and vision to predict visual key points on an object; (2) we show how we can use our learned keypoint detector to learn an extension of the kinematic chain by regressing virtual joints from the predicted visual keypoints. Our evaluation shows that our approach learns…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Vision and Imaging · Robotics and Sensor-Based Localization