TL;DR
DyMo is an inference-time framework that adaptively selects and fuses reliable modalities in incomplete multimodal data, improving classification performance without discarding or blindly imputing missing information.
Contribution
The paper introduces DyMo, a novel dynamic modality selection method that maximizes task-relevant information at inference time, with a new selection algorithm and a theoretical information-loss connection.
Findings
DyMo outperforms state-of-the-art methods across multiple datasets.
The approach effectively handles various missing-data scenarios.
The method improves classification accuracy in incomplete multimodal settings.
Abstract
Multimodal deep learning (MDL) has achieved remarkable success across various domains, yet its practical deployment is often hindered by incomplete multimodal data. Existing incomplete MDL methods either discard missing modalities, risking the loss of valuable task-relevant information, or recover them, potentially introducing irrelevant noise, leading to the discarding-imputation dilemma. To address this dilemma, in this paper, we propose DyMo, a new inference-time dynamic modality selection framework that adaptively identifies and fuses reliable recovered modalities, fully exploring task-relevant information beyond the conventional discard-or-impute paradigm. Central to DyMo is a novel selection algorithm that maximizes multimodal task-relevant information for each test sample. Since direct estimation of such information at test time is intractable due to the unknown data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
