Efficient Bi-manipulation using RGBD Multi-model Fusion based on Attention Mechanism
Jian Shen, Jiaxin Huang, Zhigong Song

TL;DR
This paper introduces a novel RGB-D multi-modal data fusion framework with attention mechanisms to enhance dual-arm robot manipulation, especially under perception challenges like occlusion and poor lighting, validated through extensive experiments.
Contribution
It proposes a mixed focal attention module and a saliency attention module within a Focal CVAE framework for improved multi-modal data fusion in robotic manipulation.
Findings
Significant performance improvements in four real-world tasks.
Enhanced robustness under perception-deficient scenarios.
Lower computational cost compared to existing methods.
Abstract
Dual-arm robots have great application prospects in intelligent manufacturing due to their human-like structure when deployed with advanced intelligence algorithm. However, the previous visuomotor policy suffers from perception deficiencies in environments where features of images are impaired by the various conditions, such as abnormal lighting, occlusion and shadow etc. The Focal CVAE framework is proposed for RGB-D multi-modal data fusion to address this challenge. In this study, a mixed focal attention module is designed for the fusion of RGB images containing color features and depth images containing 3D shape and structure information. This module highlights the prominent local features and focuses on the relevance of RGB and depth via cross-attention. A saliency attention module is proposed to improve its computational efficiency, which is applied in the encoder and the decoder…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Image Processing Techniques and Applications · Advanced Vision and Imaging
