Missing Modality Prediction for Unpaired Multimodal Learning via Joint   Embedding of Unimodal Models

Donggeun Kim; Taesup Kim

arXiv:2407.12616·cs.CV·July 18, 2024

Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models

Donggeun Kim, Taesup Kim

PDF

Open Access

TL;DR

This paper introduces a novel framework that enables the prediction of missing modalities in multimodal learning by jointly embedding unimodal models, improving robustness when some data modalities are absent.

Contribution

It proposes a parameter-efficient fine-tuning and self-supervised joint-embedding approach to predict missing modality embeddings during inference.

Findings

01

Effective missing modality prediction demonstrated on benchmark datasets

02

Improved robustness in downstream tasks with missing modalities

03

Outperforms existing methods in handling incomplete multimodal data

Abstract

Multimodal learning typically relies on the assumption that all modalities are fully available during both the training and inference phases. However, in real-world scenarios, consistently acquiring complete multimodal data presents significant challenges due to various factors. This often leads to the issue of missing modalities, where data for certain modalities are absent, posing considerable obstacles not only for the availability of multimodal pretrained models but also for their fine-tuning and the preservation of robustness in downstream tasks. To address these challenges, we propose a novel framework integrating parameter-efficient fine-tuning of unimodal pretrained models with a self-supervised joint-embedding learning method. This framework enables the model to predict the embedding of a missing modality in the representation space during inference. Our method effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems