ICYM2I: The illusion of multimodal informativeness under missingness
Young Sang Choi, Vincent Jeanselme, Pierre Elias, Shalmali Joshi

TL;DR
This paper investigates how missing data in multimodal learning environments can bias information gain estimates and introduces a correction framework, ICYM2I, to address this issue across various datasets.
Contribution
The paper formalizes the problem of missingness in multimodal data and proposes ICYM2I, a novel inverse probability weighting framework to correct bias in information gain estimation.
Findings
Bias occurs when missingness is not accounted for in multimodal data.
ICYM2I effectively corrects for missingness bias in synthetic and real datasets.
Proper adjustment significantly improves the estimation of information gain.
Abstract
Multimodal learning is of continued interest in artificial intelligence-based applications, motivated by the potential information gain from combining different data modalities. However, modalities observed in the source environment may differ from the modalities observed in the target environment due to multiple factors, including cost, hardware failure, or the perceived \textit{informativeness} of a given modality. This change in missingness patterns between the source and target environment has not been carefully studied. Na{\"i}ve estimation of the information gain associated with including an additional modality without accounting for missingness may result in improper estimates of that modality's value in the target environment. We formalize the problem of missingness, demonstrate its ubiquity, and show that the subsequent distribution shift induces bias when the missingness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Metaphor, and Cognition
