ScalSelect: Scalable Training-Free Multimodal Data Selection for Efficient Visual Instruction Tuning

Changti Wu; Jiahuai Mao; Yuzhuo Miao; Shijie Lian; Bin Yu; Xiaopeng Lin; Cong Huang; Lei Zhang; Kai Chen

arXiv:2602.11636·cs.CV·February 13, 2026

ScalSelect: Scalable Training-Free Multimodal Data Selection for Efficient Visual Instruction Tuning

Changti Wu, Jiahuai Mao, Yuzhuo Miao, Shijie Lian, Bin Yu, Xiaopeng Lin, Cong Huang, Lei Zhang, Kai Chen

PDF

Open Access

TL;DR

ScalSelect is a scalable, training-free method for multimodal data selection that efficiently identifies the most relevant samples for visual instruction tuning, significantly reducing data requirements while maintaining high performance.

Contribution

It introduces a novel linear-time, training-free data selection approach that captures instruction-relevant information without external models or pairwise comparisons.

Findings

01

Achieves over 97.5% of full-data performance with only 16% of data.

02

Outperforms full-data training in some scenarios.

03

Scalable importance scoring without pairwise comparisons.

Abstract

Large-scale Visual Instruction Tuning (VIT) has become a key paradigm for advancing the performance of vision-language models (VLMs) across various multimodal tasks. However, training on the large-scale datasets is computationally expensive and inefficient due to redundancy in the data, which motivates the need for multimodal data selection to improve training efficiency. Existing data selection methods for VIT either require costly training or gradient computation. Training-free alternatives often depend on proxy models or datasets, instruction-agnostic representations, and pairwise similarity with quadratic complexity, limiting scalability and representation fidelity. In this work, we propose ScalSelect, a scalable training-free multimodal data selection method with linear-time complexity with respect to the number of samples, eliminating the need for external models or auxiliary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning