Multi-Modal Attention-based Fusion Model for Semantic Segmentation of RGB-Depth Images
Fahimeh Fooladgar, Shohreh Kasaei

TL;DR
This paper introduces a lightweight attention-based fusion model for RGB-Depth image semantic segmentation, significantly improving accuracy and efficiency over existing methods by explicitly modeling interdependencies between modalities.
Contribution
The paper proposes a novel encoder-decoder model with an attention-based fusion block that effectively combines RGB and depth features for semantic segmentation.
Findings
Outperforms state-of-the-art models on NYU-V2, SUN RGB-D, and Stanford 2D-3D-Semantic datasets.
Achieves higher accuracy with lower computational cost and smaller model size.
Demonstrates the effectiveness of attention-based fusion in multi-modal semantic segmentation.
Abstract
The 3D scene understanding is mainly considered as a crucial requirement in computer vision and robotics applications. One of the high-level tasks in 3D scene understanding is semantic segmentation of RGB-Depth images. With the availability of RGB-D cameras, it is desired to improve the accuracy of the scene understanding process by exploiting the depth features along with the appearance features. As depth images are independent of illumination, they can improve the quality of semantic labeling alongside RGB images. Consideration of both common and specific features of these two modalities improves the performance of semantic segmentation. One of the main problems in RGB-Depth semantic segmentation is how to fuse or combine these two modalities to achieve more advantages of each modality while being computationally efficient. Recently, the methods that encounter deep convolutional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
