Typical Yet Unlikely and Normally Abnormal: The Intuition Behind High-Dimensional Statistics
Matthew J. Vowels

TL;DR
This paper explores the peculiar behaviors of high-dimensional data, introduces the concept of typicality for better normality characterization, and demonstrates its application in outlier detection contrasting with traditional methods.
Contribution
It provides intuition on high-dimensional data peculiarities and introduces typicality as a novel, information-theoretic approach for normality assessment and outlier detection.
Findings
High-dimensional data exhibit severe peculiarities even at low dimensions.
Typicality offers a better characterization of normality than traditional measures.
Typicality can effectively identify outliers, outperforming Mahalanobis distance.
Abstract
Normality, in the colloquial sense, has historically been considered an aspirational trait, synonymous with ideality. The arithmetic average and, by extension, statistics including linear regression coefficients, have often been used to characterize normality, and are often used as a way to summarize samples and identify outliers. We provide intuition behind the behavior of such statistics in high dimensions, and demonstrate that even for datasets with a relatively low number of dimensions, data start to exhibit a number of peculiarities which become severe as the number of dimensions increases. Whilst our main goal is to familiarize researchers with these peculiarities, we also show that normality can be better characterized with `typicality', an information theoretic concept relating to entropy. An application of typicality to both synthetic and real-world data concerning political…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Advanced Statistical Methods and Models
