A Concept-Centric Approach to Multi-Modality Learning
Yuchong Geng, Ao Tang

TL;DR
This paper proposes a concept-centric multi-modality learning framework that uses a shared, modality-agnostic concept space to improve efficiency, adaptability, and interpretability in multi-modal learning tasks.
Contribution
It introduces a novel shared concept space and modality-specific projection models, enabling more efficient, modular, and interpretable multi-modality learning inspired by human cognition.
Findings
Faster convergence compared to baseline models
Supports seamless integration of new modalities
Achieves competitive results with less training and no task-specific fine-tuning
Abstract
Humans possess a remarkable ability to acquire knowledge efficiently and apply it across diverse modalities through a coherent and shared understanding of the world. Inspired by this cognitive capability, we introduce a concept-centric multi-modality learning framework built around a modality-agnostic concept space that captures structured, abstract knowledge, alongside a set of modality-specific projection models that map raw inputs onto this shared space. The concept space is decoupled from any specific modality and serves as a repository of universally applicable knowledge. Once learned, the knowledge embedded in the concept space enables more efficient adaptation to new modalities, as projection models can align with existing conceptual representations rather than learning from scratch. This efficiency is empirically validated in our experiments, where the proposed framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Technology and Assessment · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
