Data curation via joint example selection further accelerates multimodal learning
Talfan Evans, Nikhil Parthasarathy, Hamza Merzic, Olivier J. Henaff

TL;DR
This paper introduces JEST, a data curation method that jointly selects data batches for multimodal contrastive learning, significantly accelerating training and reducing computational costs by leveraging data dependencies and pretrained models.
Contribution
It proposes a novel joint data selection algorithm for multimodal contrastive learning, improving training efficiency and model performance with fewer iterations and less computation.
Findings
JEST surpasses state-of-the-art models with up to 13× fewer iterations.
JEST reduces computational costs by 10×.
Joint batch selection accelerates training beyond individual example prioritization.
Abstract
Data curation is an essential component of large-scale pretraining. In this work, we demonstrate that jointly selecting batches of data is more effective for learning than selecting examples independently. Multimodal contrastive objectives expose the dependencies between data and thus naturally yield criteria for measuring the joint learnability of a batch. We derive a simple and tractable algorithm for selecting such batches, which significantly accelerate training beyond individually-prioritized data points. As performance improves by selecting from larger super-batches, we also leverage recent advances in model approximation to reduce the associated computational overhead. As a result, our approach--multimodal contrastive learning with joint example selection (JEST)--surpasses state-of-the-art models with up to 13 fewer iterations and 10 less computation. Essential to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsContrastive Learning
