TL;DR
LIMSSR introduces a novel LLM-driven framework for sequence-to-score reasoning in multimodal learning with incomplete training data, avoiding reliance on full-modal observations and improving data efficiency.
Contribution
It reformulates incomplete multimodal learning as a sequence reasoning task and employs prompt-guided imputation and fusion to handle missing data without hallucinations.
Findings
Outperforms state-of-the-art methods on three datasets.
Effectively handles training-time incomplete observations.
Establishes a new paradigm for data-efficient multimodal learning.
Abstract
Real-world multimodal learning is often hindered by missing modalities. While Incomplete Multimodal Learning (IML) has gained traction, existing methods typically rely on the unrealistic assumption of full-modal availability during training to provide reconstruction supervision or cross-modal priors. This paper tackles the more challenging setting of IML under training-time incomplete observations, which precludes reliance on a ``God's eye view'' of complete data. We propose LIMSSR (LLM-Driven Incomplete Multimodal Sequence-to-Score Reasoning), a framework that reformulates this challenge as a conditional sequence reasoning task. LIMSSR leverages the semantic reasoning capabilities of Large Language Models via Prompt-Guided Context-Aware Modality Imputation and Multidimensional Representation Fusion to infer latent semantics from available contexts without direct reconstruction. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
