Distribution Density, Tails, and Outliers in Machine Learning: Metrics and Applications
Nicholas Carlini, \'Ulfar Erlingsson, Nicolas Papernot

TL;DR
This paper introduces techniques to quantify outliers and well-represented examples in datasets, evaluates five methods across multiple datasets, and demonstrates their applications in curriculum learning and robustness enhancement.
Contribution
The paper develops and evaluates five correlated metrics for quantifying example representativeness and outlierness, with applications in dataset analysis and model training strategies.
Findings
All five methods are highly correlated.
Metrics can identify prototypical, memorized, and uncommon examples.
Metrics improve curriculum learning and adversarial robustness.
Abstract
We develop techniques to quantify the degree to which a given (training or testing) example is an outlier in the underlying distribution. We evaluate five methods to score examples in a dataset by how well-represented the examples are, for different plausible definitions of "well-represented", and apply these to four common datasets: MNIST, Fashion-MNIST, CIFAR-10, and ImageNet. Despite being independent approaches, we find all five are highly correlated, suggesting that the notion of being well-represented can be quantified. Among other uses, we find these methods can be combined to identify (a) prototypical examples (that match human expectations); (b) memorized training examples; and, (c) uncommon submodes of the dataset. Further, we show how we can utilize our metrics to determine an improved ordering for curriculum learning, and impact adversarial robustness. We release all metric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)
MethodsTest
