Multimodal perception for dexterous manipulation
Guanqun Cao, Shan Luo

TL;DR
This paper introduces a multimodal perception framework combining vision and touch for robotic manipulation, including cross-modal data generation and a spatio-temporal attention model for tactile recognition, enhancing perception and manipulation capabilities.
Contribution
It presents a novel cross-modal translation framework for vision and touch, and a spatio-temporal attention model for tactile texture recognition, advancing multimodal perception in robotics.
Findings
Effective cross-modal data generation for vision and touch
Improved tactile texture recognition using spatio-temporal attention
Potential applications in grasping and multimodal perception
Abstract
Humans usually perceive the world in a multimodal way that vision, touch, sound are utilised to understand surroundings from various dimensions. These senses are combined together to achieve a synergistic effect where the learning is more effectively than using each sense separately. For robotics, vision and touch are two key senses for the dexterous manipulation. Vision usually gives us apparent features like shape, color, and the touch provides local information such as friction, texture, etc. Due to the complementary properties between visual and tactile senses, it is desirable for us to combine vision and touch for a synergistic perception and manipulation. Many researches have been investigated about multimodal perception such as cross-modal learning, 3D reconstruction, multimodal translation with vision and touch. Specifically, we propose a cross-modal sensory data generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTactile and Sensory Interactions · Robot Manipulation and Learning · Visual Attention and Saliency Detection
