Distribution Density, Tails, and Outliers in Machine Learning: Metrics   and Applications

Nicholas Carlini; \'Ulfar Erlingsson; Nicolas Papernot

arXiv:1910.13427·cs.LG·October 30, 2019·19 cites

Distribution Density, Tails, and Outliers in Machine Learning: Metrics and Applications

Nicholas Carlini, \'Ulfar Erlingsson, Nicolas Papernot

PDF

Open Access

TL;DR

This paper introduces techniques to quantify outliers and well-represented examples in datasets, evaluates five methods across multiple datasets, and demonstrates their applications in curriculum learning and robustness enhancement.

Contribution

The paper develops and evaluates five correlated metrics for quantifying example representativeness and outlierness, with applications in dataset analysis and model training strategies.

Findings

01

All five methods are highly correlated.

02

Metrics can identify prototypical, memorized, and uncommon examples.

03

Metrics improve curriculum learning and adversarial robustness.

Abstract

We develop techniques to quantify the degree to which a given (training or testing) example is an outlier in the underlying distribution. We evaluate five methods to score examples in a dataset by how well-represented the examples are, for different plausible definitions of "well-represented", and apply these to four common datasets: MNIST, Fashion-MNIST, CIFAR-10, and ImageNet. Despite being independent approaches, we find all five are highly correlated, suggesting that the notion of being well-represented can be quantified. Among other uses, we find these methods can be combined to identify (a) prototypical examples (that match human expectations); (b) memorized training examples; and, (c) uncommon submodes of the dataset. Further, we show how we can utilize our metrics to determine an improved ordering for curriculum learning, and impact adversarial robustness. We release all metric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)

MethodsTest