Learning Deep Multimodal Feature Representation with Asymmetric   Multi-layer Fusion

Yikai Wang; Fuchun Sun; Ming Lu; Anbang Yao

arXiv:2108.05009·cs.CV·August 12, 2021

Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

Yikai Wang, Fuchun Sun, Ming Lu, Anbang Yao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel multimodal feature fusion framework that uses shared networks with modality-specific normalization and bidirectional multi-layer fusion, achieving superior results in semantic segmentation and image translation.

Contribution

It presents a compact, general multimodal fusion method with asymmetric, parameter-free operations that enable progressive feature exploitation within a single network.

Findings

01

Outperforms state-of-the-art fusion methods on multiple datasets.

02

Effective in semantic segmentation and image translation tasks.

03

Utilizes shared encoder with modality-specific batch normalization.

Abstract

We propose a compact and effective framework to fuse multimodal features at multiple layers in a single network. The framework consists of two innovative fusion schemes. Firstly, unlike existing multimodal methods that necessitate individual encoders for different modalities, we verify that multimodal features can be learnt within a shared single network by merely maintaining modality-specific batch normalization layers in the encoder, which also enables implicit fusion via joint feature representation learning. Secondly, we propose a bidirectional multi-layer fusion scheme, where multimodal features can be exploited progressively. To take advantage of such scheme, we introduce two asymmetric fusion operations including channel shuffle and pixel shift, which learn different fused features with respect to different fusion directions. These two operations are parameter-free and strengthen…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yikaiw/AsymFusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsBatch Normalization · Channel Shuffle