Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal   Representations for Contact-Rich Tasks

Michelle A. Lee; Yuke Zhu; Krishnan Srinivasan; Parth Shah; Silvio; Savarese; Li Fei-Fei; Animesh Garg; Jeannette Bohg

arXiv:1810.10191·cs.RO·March 11, 2019

Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks

Michelle A. Lee, Yuke Zhu, Krishnan Srinivasan, Parth Shah, Silvio, Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg

PDF

2 Repos 2 Datasets

TL;DR

This paper introduces a self-supervised learning approach to develop compact multimodal sensory representations that enhance the efficiency and robustness of contact-rich robotic manipulation tasks involving vision and touch.

Contribution

It presents a novel self-supervised method for learning multimodal representations that improve policy learning in contact-rich tasks, bridging the gap between visual and tactile feedback.

Findings

01

Effective in peg insertion tasks with varied geometries and clearances

02

Robust to external perturbations in both simulation and real robots

03

Improves sample efficiency of control policy learning

Abstract

Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. However, it is non-trivial to manually design a robot controller that combines modalities with very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to deploy on real robots due to sample complexity. We use self-supervision to learn a compact and multimodal representation of our sensory inputs, which can then be used to improve the sample efficiency of our policy learning. We evaluate our method on a peg insertion task, generalizing over different geometry, configurations, and clearances, while being robust to external perturbations. Results for simulated and real robot experiments are presented.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.