Learning Unseen Modality Interaction

Yunhua Zhang; Hazel Doughty; Cees G.M. Snoek

arXiv:2306.12795·cs.CV·October 26, 2023

Learning Unseen Modality Interaction

Yunhua Zhang, Hazel Doughty, Cees G.M. Snoek

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel approach for multimodal learning that generalizes to unseen modality combinations during inference by projecting features into a common space and using pseudo-supervision to improve robustness.

Contribution

It proposes a new method that enables multimodal models to handle unseen modality combinations, addressing a key limitation of existing approaches.

Findings

01

Effective across diverse tasks and modalities

02

Improves generalization to unseen modality combinations

03

Reduces overfitting through pseudo-supervision

Abstract

Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences. In this paper, we challenge this modality-complete assumption for multimodal learning and instead strive for generalization to unseen modality combinations during inference. We pose the problem of unseen modality interaction and introduce a first solution. It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved. This allows the information to be accumulated with a simple summation operation across available modalities. To reduce overfitting to less discriminative modality combinations during training, we further improve the model learning with pseudo-supervision indicating the reliability of a modality's prediction. We demonstrate that our approach is effective for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gerasmark/Reproducing-Unseen-Modality-Interaction
none

Videos

Learning Unseen Modality Interaction· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning