Diverse Subset Selection via Norm-Based Sampling and Orthogonality

Noga Bar; Raja Giryes

arXiv:2406.01086·cs.LG·September 29, 2025

Diverse Subset Selection via Norm-Based Sampling and Orthogonality

Noga Bar, Raja Giryes

PDF

Open Access

TL;DR

This paper introduces a novel subset selection method combining feature norms, randomization, and orthogonality to efficiently select diverse, informative samples for annotation, improving performance across multiple image and text benchmarks.

Contribution

It presents a simple, effective subset selection technique that leverages feature norms and orthogonality, enhancing diversity and informativeness over existing methods.

Findings

01

Consistently improves subset selection performance on various benchmarks.

02

Effective both as a standalone method and when combined with other techniques.

03

Reduces redundancy and encourages coverage of feature space.

Abstract

Large annotated datasets are crucial for the success of deep neural networks, but labeling data can be prohibitively expensive in domains such as medical imaging. This work tackles the subset selection problem: selecting a small set of the most informative examples from a large unlabeled pool for annotation. We propose a simple and effective method that combines feature norms, randomization, and orthogonality (via the Gram-Schmidt process) to select diverse and informative samples. Feature norms serve as a proxy for informativeness, while randomization and orthogonalization reduce redundancy and encourage coverage of the feature space. Extensive experiments on image and text benchmarks, including CIFAR-10/100, Tiny ImageNet, ImageNet, OrganAMNIST, and Yelp, show that our method consistently improves subset selection performance, both as a standalone approach and when integrated with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsPruning