Curriculum Learning with Quality-Driven Data Selection
Biao Wu, Ling Chen

TL;DR
This paper introduces a novel curriculum learning approach for multimodal large language models that uses image-text correlation and model perplexity to select high-quality data, improving model capabilities efficiently.
Contribution
It proposes a new data selection method based on a two-dimensional quality space, enabling better control and curriculum learning in multimodal model training.
Findings
Significant improvements in five key capabilities over baseline datasets.
Effective data quality evaluation using image-text correlation and perplexity.
Enhanced training efficiency through multi-stage data subsets.
Abstract
The impressive multimodal capabilities demonstrated by OpenAI's GPT-4 have generated significant interest in the development of Multimodal Large Language Models (MLLMs). Visual instruction tuning of MLLMs with machine-generated instruction-following data has shown to enhance zero-shot capabilities across various tasks. However, there has been limited exploration into controlling the quality of the instruction data.Current methodologies for data selection in MLLMs often rely on single, unreliable scores or use downstream tasks for selection, which is time-consuming and can lead to potential overfitting on the chosen evaluation datasets. To mitigate these limitations, we propose a novel data selection methodology that utilizes image-text correlation and model perplexity to evaluate and select data of varying quality. This approach leverages the distinct distribution of these two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms
