Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion
Yikai Wang, Fuchun Sun, Ming Lu, Anbang Yao

TL;DR
This paper introduces a novel multimodal feature fusion framework that uses shared networks with modality-specific normalization and bidirectional multi-layer fusion, achieving superior results in semantic segmentation and image translation.
Contribution
It presents a compact, general multimodal fusion method with asymmetric, parameter-free operations that enable progressive feature exploitation within a single network.
Findings
Outperforms state-of-the-art fusion methods on multiple datasets.
Effective in semantic segmentation and image translation tasks.
Utilizes shared encoder with modality-specific batch normalization.
Abstract
We propose a compact and effective framework to fuse multimodal features at multiple layers in a single network. The framework consists of two innovative fusion schemes. Firstly, unlike existing multimodal methods that necessitate individual encoders for different modalities, we verify that multimodal features can be learnt within a shared single network by merely maintaining modality-specific batch normalization layers in the encoder, which also enables implicit fusion via joint feature representation learning. Secondly, we propose a bidirectional multi-layer fusion scheme, where multimodal features can be exploited progressively. To take advantage of such scheme, we introduce two asymmetric fusion operations including channel shuffle and pixel shift, which learn different fused features with respect to different fusion directions. These two operations are parameter-free and strengthen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsBatch Normalization · Channel Shuffle
