Rethinking Early-Fusion Strategies for Improved Multimodal Image Segmentation
Zhengwen Shen, Yulian Li, Han Zhang, Yuchen Weng, Jun Wang

TL;DR
This paper introduces EFNet, a lightweight multimodal fusion network that employs early fusion and feature clustering to improve RGB-T image segmentation efficiency and accuracy in low-light conditions.
Contribution
The paper proposes a novel early fusion strategy with simple feature clustering and a multi-scale decoder, reducing parameters and computation while enhancing segmentation performance.
Findings
Outperforms state-of-the-art methods on multiple datasets.
Uses fewer parameters and less computation.
Achieves improved segmentation accuracy in low-light conditions.
Abstract
RGB and thermal image fusion have great potential to exhibit improved semantic segmentation in low-illumination conditions. Existing methods typically employ a two-branch encoder framework for multimodal feature extraction and design complicated feature fusion strategies to achieve feature extraction and fusion for multimodal semantic segmentation. However, these methods require massive parameter updates and computational effort during the feature extraction and fusion. To address this issue, we propose a novel multimodal fusion network (EFNet) based on an early fusion strategy and a simple but effective feature clustering for training efficient RGB-T semantic segmentation. In addition, we also propose a lightweight and efficient multi-scale feature aggregation decoder based on Euclidean distance. We validate the effectiveness of our method on different datasets and outperform previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
