Diverse Subset Selection via Norm-Based Sampling and Orthogonality
Noga Bar, Raja Giryes

TL;DR
This paper introduces a novel subset selection method combining feature norms, randomization, and orthogonality to efficiently select diverse, informative samples for annotation, improving performance across multiple image and text benchmarks.
Contribution
It presents a simple, effective subset selection technique that leverages feature norms and orthogonality, enhancing diversity and informativeness over existing methods.
Findings
Consistently improves subset selection performance on various benchmarks.
Effective both as a standalone method and when combined with other techniques.
Reduces redundancy and encourages coverage of feature space.
Abstract
Large annotated datasets are crucial for the success of deep neural networks, but labeling data can be prohibitively expensive in domains such as medical imaging. This work tackles the subset selection problem: selecting a small set of the most informative examples from a large unlabeled pool for annotation. We propose a simple and effective method that combines feature norms, randomization, and orthogonality (via the Gram-Schmidt process) to select diverse and informative samples. Feature norms serve as a proxy for informativeness, while randomization and orthogonalization reduce redundancy and encourage coverage of the feature space. Extensive experiments on image and text benchmarks, including CIFAR-10/100, Tiny ImageNet, ImageNet, OrganAMNIST, and Yelp, show that our method consistently improves subset selection performance, both as a standalone approach and when integrated with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsPruning
