Optimizing rgb-d semantic segmentation through multi-modal interaction and pooling attention
Shuai Zhang, Minghong Xie

TL;DR
This paper introduces MIPANet, a novel neural network that enhances RGB-D semantic segmentation by effectively fusing multi-modal data and applying pooling attention, leading to improved performance on indoor scene datasets.
Contribution
The paper proposes MIPANet with a Multi-modal Interaction Fusion Module and Pooling Attention Modules, advancing multi-modal fusion and feature enhancement in RGB-D segmentation.
Findings
Outperforms existing methods on NYUDv2 dataset
Achieves superior results on SUN-RGBD dataset
Demonstrates significant improvement in segmentation accuracy
Abstract
Semantic segmentation of RGB-D images involves understanding the appearance and spatial relationships of objects within a scene, which requires careful consideration of various factors. However, in indoor environments, the simple input of RGB and depth images often results in a relatively limited acquisition of semantic and spatial information, leading to suboptimal segmentation outcomes. To address this, we propose the Multi-modal Interaction and Pooling Attention Network (MIPANet), a novel approach designed to harness the interactive synergy between RGB and depth modalities, optimizing the utilization of complementary information. Specifically, we incorporate a Multi-modal Interaction Fusion Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Additionally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Video Surveillance and Tracking Methods
