A Practitioner's Guide to Continual Multimodal Pretraining
Karsten Roth, Vishaal Udandarao, Sebastian Dziadzio, Ameya Prabhu,, Mehdi Cherti, Oriol Vinyals, Olivier H\'enaff, Samuel Albanie, Matthias, Bethge, Zeynep Akata

TL;DR
This paper introduces FoMo-in-Flux, a comprehensive benchmark for continual multimodal pretraining, and offers practical guidance for updating vision-language models in real-world, resource-constrained scenarios.
Contribution
It presents a new benchmark and extensive analysis to guide effective continual pretraining of multimodal models in practical deployment settings.
Findings
Data mixtures and stream orderings significantly impact performance.
Parameter-efficient updates can be effective in continual learning.
Model and compute scaling influence pretraining outcomes.
Abstract
Multimodal foundation models serve numerous applications at the intersection of vision and language. Still, despite being pretrained on extensive data, they become outdated over time. To keep models updated, research into continual pretraining mainly explores scenarios with either (1) infrequent, indiscriminate updates on large-scale new data, or (2) frequent, sample-level updates. However, practical model deployment often operates in the gap between these two limit cases, as real-world applications often demand adaptation to specific subdomains, tasks or concepts -- spread over the entire, varying life cycle of a model. In this work, we complement current perspectives on continual pretraining through a research test bed as well as provide comprehensive guidance for effective continual model updates in such scenarios. We first introduce FoMo-in-Flux, a continual multimodal pretraining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHearing Impairment and Communication
