Towards Free Data Selection with General-Purpose Models
Yichen Xie, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan

TL;DR
This paper introduces FreeSel, a fast and efficient data selection method that leverages existing general-purpose models to select informative samples in a single inference pass, bypassing traditional iterative active learning pipelines.
Contribution
The paper proposes a novel single-pass data selection pipeline using semantic patterns from general models, significantly improving efficiency over existing active learning methods.
Findings
FreeSel is 530x faster than traditional active learning methods.
It effectively selects informative data samples across various vision tasks.
The method achieves competitive performance without additional training or supervision.
Abstract
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets. However, current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly. In this paper, we challenge this status quo by designing a distinct data selection pipeline that utilizes existing general-purpose models to select data from various datasets with a single-pass inference without the need for additional training or supervision. A novel free data selection (FreeSel) method is proposed following this new pipeline. Specifically, we define semantic patterns extracted from inter-mediate features of the general-purpose model to capture subtle local information in each image. We then enable the selection of all data samples in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Advanced Image and Video Retrieval Techniques
