CoIDO: Efficient Data Selection for Visual Instruction Tuning via Coupled Importance-Diversity Optimization
Yichen Yan, Ming Zhong, Qi Zhu, Xiaoling Gu, Jinpeng Chen, Huan Li

TL;DR
CoIDO introduces a lightweight, dual-objective data selection method that jointly optimizes importance and diversity, significantly reducing computational costs while maintaining high performance in visual instruction tuning.
Contribution
It proposes a novel, scalable framework that trains a lightweight scorer on small samples to efficiently select important and diverse data for large-scale multimodal model tuning.
Findings
Achieved 98.2% of full-data performance with only 20% data
Reduced computational overhead by training on small data samples
Effective balance of importance and diversity in data selection
Abstract
Multimodal large language models (MLLMs) rely heavily on instruction tuning to align vision and language capabilities, yet the computational cost of training on large-scale datasets remains a major bottleneck. Existing data selection methods aim to mitigate this by selecting important and diverse subsets, but they often suffer from two critical drawbacks: high computational overhead from processing the entire dataset and suboptimal data selection due to separate treatment of importance and diversity. We introduce CoIDO, a novel dual-objective framework that jointly optimizes data importance and diversity to overcome these challenges. Unlike existing approaches that require costly evaluations across the whole dataset, CoIDO employs a lightweight plug-in scorer. This scorer is trained on just a small random sample of data to learn the distribution of the candidate set, drastically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Natural Language Processing Techniques
