Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion
Sijie Mai, Haifeng Hu, Songlong Xing

TL;DR
This paper introduces a novel adversarial and graph-based framework for learning a modality-invariant joint embedding space to improve multimodal fusion, achieving state-of-the-art results across multiple datasets.
Contribution
It proposes an adversarial encoder-decoder-classifier framework combined with hierarchical graph neural networks for effective multimodal fusion.
Findings
Achieves state-of-the-art performance on multiple datasets.
Learned embeddings are highly discriminative.
Effectively reduces modality gap through adversarial training.
Abstract
Learning joint embedding space for various modalities is of vital importance for multimodal fusion. Mainstream modality fusion approaches fail to achieve this goal, leaving a modality gap which heavily affects cross-modal fusion. In this paper, we propose a novel adversarial encoder-decoder-classifier framework to learn a modality-invariant embedding space. Since the distributions of various modalities vary in nature, to reduce the modality gap, we translate the distributions of source modalities into that of target modality via their respective encoders using adversarial training. Furthermore, we exert additional constraints on embedding space by introducing reconstruction loss and classification loss. Then we fuse the encoded representations using hierarchical graph neural network which explicitly explores unimodal, bimodal and trimodal interactions in multi-stage. Our method achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsGraph Neural Network
