Are All Training Examples Created Equal? An Empirical Study

Kailas Vodrahalli; Ke Li; Jitendra Malik

arXiv:1811.12569·cs.LG·December 3, 2018·45 cites

Are All Training Examples Created Equal? An Empirical Study

Kailas Vodrahalli, Ke Li, Jitendra Malik

PDF

Open Access

TL;DR

This paper investigates the importance of individual training examples in large datasets for computer vision, revealing that smaller, carefully selected subsets can sometimes suffice for effective training, with implications for active learning.

Contribution

It introduces a gradient-based importance measure to empirically analyze training example significance across datasets, offering insights into dataset diversity and training efficiency.

Findings

01

Small subsamples can be sufficient for training in some datasets

02

Relative importance of examples varies across datasets

03

The analysis method aids understanding of dataset diversity

Abstract

Modern computer vision algorithms often rely on very large training datasets. However, it is conceivable that a carefully selected subsample of the dataset is sufficient for training. In this paper, we propose a gradient-based importance measure that we use to empirically analyze relative importance of training images in four datasets of varying complexity. We find that in some cases, a small subsample is indeed sufficient for training. For other datasets, however, the relative differences in importance are negligible. These results have important implications for active learning on deep networks. Additionally, our analysis method can be used as a general tool to better understand diversity of training examples in datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification