Optimizing rgb-d semantic segmentation through multi-modal interaction   and pooling attention

Shuai Zhang; Minghong Xie

arXiv:2311.11312·cs.CV·December 7, 2023·2 cites

Optimizing rgb-d semantic segmentation through multi-modal interaction and pooling attention

Shuai Zhang, Minghong Xie

PDF

Open Access

TL;DR

This paper introduces MIPANet, a novel neural network that enhances RGB-D semantic segmentation by effectively fusing multi-modal data and applying pooling attention, leading to improved performance on indoor scene datasets.

Contribution

The paper proposes MIPANet with a Multi-modal Interaction Fusion Module and Pooling Attention Modules, advancing multi-modal fusion and feature enhancement in RGB-D segmentation.

Findings

01

Outperforms existing methods on NYUDv2 dataset

02

Achieves superior results on SUN-RGBD dataset

03

Demonstrates significant improvement in segmentation accuracy

Abstract

Semantic segmentation of RGB-D images involves understanding the appearance and spatial relationships of objects within a scene, which requires careful consideration of various factors. However, in indoor environments, the simple input of RGB and depth images often results in a relatively limited acquisition of semantic and spatial information, leading to suboptimal segmentation outcomes. To address this, we propose the Multi-modal Interaction and Pooling Attention Network (MIPANet), a novel approach designed to harness the interactive synergy between RGB and depth modalities, optimizing the utilization of complementary information. Specifically, we incorporate a Multi-modal Interaction Fusion Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Additionally,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Video Surveillance and Tracking Methods