Calibrated Multimodal Representation Learning with Missing Modalities
Xiaohao Liu, Xiaobo Xia, Jiaheng Wei, Shuo Yang, Xiu Su, See-Kiong Ng, Tat-Seng Chua

TL;DR
This paper introduces CalMRL, a novel method for multimodal representation learning that effectively handles missing modalities by calibrating incomplete alignments, supported by theoretical analysis and extensive experiments.
Contribution
It proposes CalMRL, a calibration approach that models missing modality imputation at the representation level, enabling flexible learning from incomplete multimodal data.
Findings
CalMRL mitigates anchor shift caused by missing modalities.
The method demonstrates superior performance on multimodal datasets.
Theoretical analysis confirms convergence and effectiveness.
Abstract
Multimodal representation learning harmonizes distinct modalities by aligning them into a unified latent space. Recent research generalizes traditional cross-modal alignment to produce enhanced multimodal synergy but requires all modalities to be present for a common instance, making it challenging to utilize prevalent datasets with missing modalities. We provide theoretical insights into this issue from an anchor shift perspective. Observed modalities are aligned with a local anchor that deviates from the optimal one when all modalities are present, resulting in an inevitable shift. To address this, we propose CalMRL to calibrate incomplete alignments caused by missing modalities. CalMRL leverages the priors and the inherent connections among modalities to model the imputation for the missing ones at the representation level. To resolve the optimization dilemma, we employ a bi-step…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
