EGFormer: Towards Efficient and Generalizable Multimodal Semantic Segmentation

Zelin Zhang; Tao Zhang; KediLI; Xu Zheng

arXiv:2505.14014·cs.CV·May 21, 2025

EGFormer: Towards Efficient and Generalizable Multimodal Semantic Segmentation

Zelin Zhang, Tao Zhang, KediLI, Xu Zheng

PDF

Open Access

TL;DR

EGFormer is an efficient multimodal semantic segmentation framework that dynamically prioritizes and filters modalities, reducing computational costs significantly while maintaining high performance and demonstrating strong generalization in transfer tasks.

Contribution

The paper introduces EGFormer, a novel framework with modules for dynamic modality importance scoring and dropping, enabling efficient and generalizable multimodal segmentation.

Findings

01

Achieves up to 88% reduction in parameters

02

Reduces GFLOPs by 50% while maintaining accuracy

03

Outperforms existing methods in transfer learning tasks

Abstract

Recent efforts have explored multimodal semantic segmentation using various backbone architectures. However, while most methods aim to improve accuracy, their computational efficiency remains underexplored. To address this, we propose EGFormer, an efficient multimodal semantic segmentation framework that flexibly integrates an arbitrary number of modalities while significantly reducing model parameters and inference time without sacrificing performance. Our framework introduces two novel modules. First, the Any-modal Scoring Module (ASM) assigns importance scores to each modality independently, enabling dynamic ranking based on their feature maps. Second, the Modal Dropping Module (MDM) filters out less informative modalities at each stage, selectively preserving and aggregating only the most valuable features. This design allows the model to leverage useful information from all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques