Explore the Limits of Omni-modal Pretraining at Scale
Yiyuan Zhang, Handong Li, Jing Liu, Xiangyu Yue

TL;DR
This paper introduces MiCo, a scalable pretraining framework for omni-modal intelligence that significantly advances multimodal understanding across diverse tasks and modalities, setting new state-of-the-art records.
Contribution
The paper presents MiCo, a novel scalable pretraining paradigm that enables learning universal representations across many modalities and data scales, demonstrating emergent multimodal capabilities.
Findings
Achieved 37 new state-of-the-art records across multiple benchmarks.
Demonstrated emergent abilities in multimodal perception and understanding.
Showed scalability in handling numerous modalities and large datasets.
Abstract
We propose to build omni-modal intelligence, which is capable of understanding any modality and learning universal representations. In specific, we propose a scalable pretraining paradigm, named Multimodal Context (MiCo), which can scale up the numbers of modalities and amount of data, together with the model parameters, in the pretraining process. With MiCo, the pretrained models show significant emergent abilities in multimodal learning, which are evaluated on the following tasks: i) single-modality perception benchmarks of 10 different modalities, ii) 25 cross-modality understanding tasks of retrieval, question-answering, captioning, and iii) 18 multimodal large language model benchmarks. Our models establish 37 new records for state-of-the-art performance. We hope that our research could contribute to the development of omni-modal intelligence. Code and Models are at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTeaching and Learning Programming · Human Motion and Animation · Design Education and Practice
