Chameleon: Images Are What You Need For Multimodal Learning Robust To Missing Modalities
Muhammad Irzam Liaqat, Shah Nawaz, Muhammad Zaigham Zaheer, Muhammad, Saad Saeed, Hassan Sajjad, Tom De Schepper, Karthik Nandakumar, Muhammad, Haris Khan Markus Schedl

TL;DR
Chameleon introduces a novel multimodal learning approach that encodes textual data into visual representations, enabling robust performance even when some modalities are missing, by avoiding reliance on modality-specific branches.
Contribution
It proposes a unified input encoding method that replaces multi-branch architectures, enhancing robustness to missing modalities in multimodal learning.
Findings
Achieves superior performance with all modalities present.
Demonstrates resilience with missing modalities.
Performs well across multiple challenging datasets.
Abstract
Multimodal learning has demonstrated remarkable performance improvements over unimodal architectures. However, multimodal learning methods often exhibit deteriorated performances if one or more modalities are missing. This may be attributed to the commonly used multi-branch design containing modality-specific streams making the models reliant on the availability of a complete set of modalities. In this work, we propose a robust textual-visual multimodal learning method, Chameleon, that completely deviates from the conventional multi-branch design. To enable this, we present the unification of input modalities into one format by encoding textual modality into visual representations. As a result, our approach does not require modality-specific branches to learn modality-independent multimodal representations making it robust to missing modalities. Extensive experiments are performed on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
