Calibrated Multimodal Representation Learning with Missing Modalities

Xiaohao Liu; Xiaobo Xia; Jiaheng Wei; Shuo Yang; Xiu Su; See-Kiong Ng; Tat-Seng Chua

arXiv:2511.12034·cs.CV·May 13, 2026

Calibrated Multimodal Representation Learning with Missing Modalities

Xiaohao Liu, Xiaobo Xia, Jiaheng Wei, Shuo Yang, Xiu Su, See-Kiong Ng, Tat-Seng Chua

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces CalMRL, a novel method for multimodal representation learning that effectively handles missing modalities by calibrating incomplete alignments, supported by theoretical analysis and extensive experiments.

Contribution

It proposes CalMRL, a calibration approach that models missing modality imputation at the representation level, enabling flexible learning from incomplete multimodal data.

Findings

01

CalMRL mitigates anchor shift caused by missing modalities.

02

The method demonstrates superior performance on multimodal datasets.

03

Theoretical analysis confirms convergence and effectiveness.

Abstract

Multimodal representation learning harmonizes distinct modalities by aligning them into a unified latent space. Recent research generalizes traditional cross-modal alignment to produce enhanced multimodal synergy but requires all modalities to be present for a common instance, making it challenging to utilize prevalent datasets with missing modalities. We provide theoretical insights into this issue from an anchor shift perspective. Observed modalities are aligned with a local anchor that deviates from the optimal one when all modalities are present, resulting in an inevitable shift. To address this, we propose CalMRL to calibrate incomplete alignments caused by missing modalities. CalMRL leverages the priors and the inherent connections among modalities to model the imputation for the missing ones at the representation level. To resolve the optimization dilemma, we employ a bi-step…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Xiaohao-Liu/CalMRL
github

Datasets

xhLiu/MM_eval
dataset· 738 dl
738 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.