# Making Sense of Vision and Touch: Learning Multimodal Representations   for Contact-Rich Tasks

**Authors:** Michelle A. Lee, Yuke Zhu, Peter Zachares, Matthew Tan, Krishnan, Srinivasan, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg

arXiv: 1907.13098 · 2019-07-31

## TL;DR

This paper introduces a self-supervised learning approach to create compact multimodal sensory representations that enhance the efficiency and robustness of contact-rich manipulation policies in unstructured environments, demonstrated on peg insertion tasks.

## Contribution

It proposes a novel self-supervised multimodal representation learning method that improves sample efficiency and generalization in robot manipulation tasks involving vision and touch.

## Key findings

- The method generalizes across different geometries and configurations.
- It is robust to external perturbations.
- Effective in both simulation and real-world experiments.

## Abstract

Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. It is non-trivial to manually design a robot controller that combines these modalities which have very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to deploy on real robots due to sample complexity. In this work, we use self-supervision to learn a compact and multimodal representation of our sensory inputs, which can then be used to improve the sample efficiency of our policy learning. Evaluating our method on a peg insertion task, we show that it generalizes over varying geometries, configurations, and clearances, while being robust to external perturbations. We also systematically study different self-supervised learning objectives and representation learning architectures. Results are presented in simulation and on a physical robot.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.13098/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1907.13098/full.md

## References

72 references — full list in the complete paper: https://tomesphere.com/paper/1907.13098/full.md

---
Source: https://tomesphere.com/paper/1907.13098