Multi-Modal Attention-based Fusion Model for Semantic Segmentation of   RGB-Depth Images

Fahimeh Fooladgar; Shohreh Kasaei

arXiv:1912.11691·cs.CV·December 30, 2019·6 cites

Multi-Modal Attention-based Fusion Model for Semantic Segmentation of RGB-Depth Images

Fahimeh Fooladgar, Shohreh Kasaei

PDF

Open Access

TL;DR

This paper introduces a lightweight attention-based fusion model for RGB-Depth image semantic segmentation, significantly improving accuracy and efficiency over existing methods by explicitly modeling interdependencies between modalities.

Contribution

The paper proposes a novel encoder-decoder model with an attention-based fusion block that effectively combines RGB and depth features for semantic segmentation.

Findings

01

Outperforms state-of-the-art models on NYU-V2, SUN RGB-D, and Stanford 2D-3D-Semantic datasets.

02

Achieves higher accuracy with lower computational cost and smaller model size.

03

Demonstrates the effectiveness of attention-based fusion in multi-modal semantic segmentation.

Abstract

The 3D scene understanding is mainly considered as a crucial requirement in computer vision and robotics applications. One of the high-level tasks in 3D scene understanding is semantic segmentation of RGB-Depth images. With the availability of RGB-D cameras, it is desired to improve the accuracy of the scene understanding process by exploiting the depth features along with the appearance features. As depth images are independent of illumination, they can improve the quality of semantic labeling alongside RGB images. Consideration of both common and specific features of these two modalities improves the performance of semantic segmentation. One of the main problems in RGB-Depth semantic segmentation is how to fuse or combine these two modalities to achieve more advantages of each modality while being computationally efficient. Recently, the methods that encounter deep convolutional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques