Modality to Modality Translation: An Adversarial Representation Learning   and Graph Fusion Network for Multimodal Fusion

Sijie Mai; Haifeng Hu; Songlong Xing

arXiv:1911.07848·cs.CV·December 11, 2020·19 cites

Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion

Sijie Mai, Haifeng Hu, Songlong Xing

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel adversarial and graph-based framework for learning a modality-invariant joint embedding space to improve multimodal fusion, achieving state-of-the-art results across multiple datasets.

Contribution

It proposes an adversarial encoder-decoder-classifier framework combined with hierarchical graph neural networks for effective multimodal fusion.

Findings

01

Achieves state-of-the-art performance on multiple datasets.

02

Learned embeddings are highly discriminative.

03

Effectively reduces modality gap through adversarial training.

Abstract

Learning joint embedding space for various modalities is of vital importance for multimodal fusion. Mainstream modality fusion approaches fail to achieve this goal, leaving a modality gap which heavily affects cross-modal fusion. In this paper, we propose a novel adversarial encoder-decoder-classifier framework to learn a modality-invariant embedding space. Since the distributions of various modalities vary in nature, to reduce the modality gap, we translate the distributions of source modalities into that of target modality via their respective encoders using adversarial training. Furthermore, we exert additional constraints on embedding space by introducing reconstruction loss and classification loss. Then we fuse the encoded representations using hierarchical graph neural network which explicitly explores unimodal, bimodal and trimodal interactions in multi-stage. Our method achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TmacMai/ARGF_multimodal_fusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsGraph Neural Network