Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity
Zhuo Zhi, Ziquan Liu, Moe Elbadawi, Adam Daneshmend, Mine Orlu, Abdul, Basit, Andreas Demosthenous, Miguel Rodrigues

TL;DR
This paper introduces a retrieval-augmented in-context learning approach for multimodal tasks with missing data and limited samples, significantly improving performance and sample efficiency over existing methods.
Contribution
It presents a novel data-dependent framework leveraging in-context learning to handle missing modalities and data scarcity in multimodal learning, outperforming parametric methods.
Findings
Achieves 6.1% average improvement with only 1% training data
Reduces performance gap between full- and missing-modality data
Enhances classification accuracy across various multimodal tasks
Abstract
Multimodal machine learning with missing modalities is an increasingly relevant challenge arising in various applications such as healthcare. This paper extends the current research into missing modalities to the low-data regime, i.e., a downstream task has both missing modalities and limited sample size issues. This problem setting is particularly challenging and also practical as it is often expensive to get full-modality data and sufficient annotated training samples. We propose to use retrieval-augmented in-context learning to address these two crucial issues by unleashing the potential of a transformer's in-context learning ability. Diverging from existing methods, which primarily belong to the parametric paradigm and often require sufficient training samples, our work exploits the value of the available full-modality data, offering a novel perspective on resolving the challenge.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
