Exploring Curriculum Learning for Vision-Language Tasks: A Study on Small-Scale Multimodal Training
Rohan Saha, Abrar Fahim, Alona Fyshe, Alex Murphy

TL;DR
This study investigates how curriculum learning, pretraining, and model type influence performance in small-scale multimodal and unimodal vision-language tasks, revealing curriculum learning's benefits especially in multimodal settings with limited data.
Contribution
It provides a comparative analysis of curriculum learning, pretraining, and model size in limited data regimes for vision-language tasks, highlighting curriculum learning's advantages in multimodal performance.
Findings
Curriculum learning improves multimodal task performance.
Pretraining enhances text-only task results.
Smaller models benefit more from curriculum learning.
Abstract
For specialized domains, there is often not a wealth of data with which to train large machine learning models. In such limited data / compute settings, various methods exist aiming to , such as finetuning from a pretrained model, modulating difficulty levels as data are presented to a model (curriculum learning), and considering the role of model type / size. Approaches to efficient learning also take inspiration from learning by considering use cases where machine learning systems have access to approximately the same number of words experienced by a 13 year old child (100M words). We investigate the role of 3 primary variables in a limited data regime as part of the multimodal track of the BabyLM challenge. We contrast: (i) curriculum learning, (ii), pretraining (with text-only data), (iii) model type. We modulate these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Storytelling and Education · English Language Learning and Teaching
