ICONS: Influence Consensus for Vision-Language Data Selection

Xindi Wu; Mengzhou Xia; Rulin Shao; Zhiwei Deng; Pang Wei Koh; Olga Russakovsky

arXiv:2501.00654·cs.CV·December 30, 2025

ICONS: Influence Consensus for Vision-Language Data Selection

Xindi Wu, Mengzhou Xia, Rulin Shao, Zhiwei Deng, Pang Wei Koh, Olga Russakovsky

PDF

Open Access 1 Datasets

TL;DR

ICONS is a gradient-based data selection method that identifies valuable vision-language training examples across tasks, reducing data size while maintaining high performance and generalization.

Contribution

Introduces ICONS, a novel influence consensus approach leveraging training dynamics and majority voting for robust, scalable, and cross-task data selection in vision-language models.

Findings

01

Models trained on 20% data retain over 98% performance.

02

Selected data generalizes well to unseen tasks and architectures.

03

Released compact subsets for efficient model development.

Abstract

Training vision-language models via instruction tuning relies on large data mixtures spanning diverse tasks and domains, yet these mixtures frequently include redundant information that increases computational costs without proportional gains. Existing methods typically rely on task-agnostic heuristics to estimate data importance, limiting their effectiveness across tasks. We introduce ICONS, a gradient-based Influence CONsensus approach for vision-language data Selection. Our method leverages first-order training dynamics to estimate each example's influence on validation performance, then aggregates these estimates across tasks via majority voting. This cross-task consensus identifies consistently valuable data points while mitigating score calibration and outlier sensitivity, enabling robust and scalable data selection for diverse multitask mixtures. Models trained on our selected…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

xindiw/LLAVA-ICONS-133K
dataset· 6 dl
6 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Advanced Image and Video Retrieval Techniques