StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation
Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li

TL;DR
StitchFusion introduces a flexible multimodal fusion framework that leverages pre-trained models and a novel MultiAdapter module to enhance semantic segmentation across various visual modalities with minimal additional parameters.
Contribution
The paper presents StitchFusion, a simple yet effective framework that enables multi-modal and multi-scale feature fusion during encoding using shared pre-trained models and a new MultiAdapter for cross-modal information transfer.
Findings
Achieves state-of-the-art results on four multi-modal segmentation datasets.
Demonstrates the effectiveness of MultiAdapter in enhancing cross-modal feature exchange.
Shows that combining MultiAdapter with existing Feature Fusion Modules is complementary.
Abstract
Multimodal semantic segmentation shows significant potential for enhancing segmentation accuracy in complex scenes. However, current methods often incorporate specialized feature fusion modules tailored to specific modalities, thereby restricting input flexibility and increasing the number of training parameters. To address these challenges, we propose StitchFusion, a straightforward yet effective modal fusion framework that integrates large-scale pre-trained models directly as encoders and feature fusers. This approach facilitates comprehensive multi-modal and multi-scale feature fusion, accommodating any visual modal inputs. Specifically, Our framework achieves modal integration during encoding by sharing multi-modal visual information. To enhance information exchange across modalities, we introduce a multi-directional adapter module (MultiAdapter) to enable cross-modal information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Hand Gesture Recognition Systems · Tactile and Sensory Interactions
MethodsAdapter
