Exploring Curriculum Learning for Vision-Language Tasks: A Study on   Small-Scale Multimodal Training

Rohan Saha; Abrar Fahim; Alona Fyshe; Alex Murphy

arXiv:2410.15509·cs.LG·October 22, 2024

Exploring Curriculum Learning for Vision-Language Tasks: A Study on Small-Scale Multimodal Training

Rohan Saha, Abrar Fahim, Alona Fyshe, Alex Murphy

PDF

Open Access 1 Repo

TL;DR

This study investigates how curriculum learning, pretraining, and model type influence performance in small-scale multimodal and unimodal vision-language tasks, revealing curriculum learning's benefits especially in multimodal settings with limited data.

Contribution

It provides a comparative analysis of curriculum learning, pretraining, and model size in limited data regimes for vision-language tasks, highlighting curriculum learning's advantages in multimodal performance.

Findings

01

Curriculum learning improves multimodal task performance.

02

Pretraining enhances text-only task results.

03

Smaller models benefit more from curriculum learning.

Abstract

For specialized domains, there is often not a wealth of data with which to train large machine learning models. In such limited data / compute settings, various methods exist aiming to $do more with less$ , such as finetuning from a pretrained model, modulating difficulty levels as data are presented to a model (curriculum learning), and considering the role of model type / size. Approaches to efficient $machine$ learning also take inspiration from $human$ learning by considering use cases where machine learning systems have access to approximately the same number of words experienced by a 13 year old child (100M words). We investigate the role of 3 primary variables in a limited data regime as part of the multimodal track of the BabyLM challenge. We contrast: (i) curriculum learning, (ii), pretraining (with text-only data), (iii) model type. We modulate these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

simpleparadox/baby_lm_2024
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Storytelling and Education · English Language Learning and Teaching