U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic   Segmentation

Bingyu Li; Da Zhang; Zhiyuan Zhao; Junyu Gao; Xuelong Li

arXiv:2405.15365·cs.CV·May 27, 2024·3 cites

U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation

Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li

PDF

Open Access 1 Repo

TL;DR

U3M introduces an unbiased, multiscale fusion approach for multimodal semantic segmentation, enhancing robustness and adaptability across diverse datasets by effectively integrating global and local features.

Contribution

The paper proposes a novel unbiased multiscale fusion model that automatically balances multimodal data integration, improving segmentation performance and versatility.

Findings

01

Achieves superior accuracy on multiple datasets.

02

Effectively balances multimodal contributions.

03

Enhances robustness and adaptability.

Abstract

Multimodal semantic segmentation is a pivotal component of computer vision and typically surpasses unimodal methods by utilizing rich information set from various sources.Current models frequently adopt modality-specific frameworks that inherently biases toward certain modalities. Although these biases might be advantageous in specific situations, they generally limit the adaptability of the models across different multimodal contexts, thereby potentially impairing performance. To address this issue, we leverage the inherent capabilities of the model itself to discover the optimal equilibrium in multimodal fusion and introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation. Specifically, this method involves an unbiased integration of multimodal visual data. Additionally, we employ feature fusion at multiple scales to ensure the effective extraction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

libingyu01/u3m-multimodal-semantic-segmentation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsSparse Evolutionary Training