Approximation of projections of random vectors
Elizabeth Meckes

TL;DR
This paper studies how random projections of high-dimensional data tend to be Gaussian, providing bounds on the distance to Gaussian distribution and showing most projections are approximately Gaussian when dimensions grow.
Contribution
It offers explicit bounds on the Gaussian approximation of random projections and applies these results to high-dimensional data analysis.
Findings
Most high-dimensional projections are close to Gaussian distributions.
Explicit bounds depend on dimension, projection size, and data distribution.
Projections with size proportional to sqrt(log d) are approximately Gaussian.
Abstract
Let be a -dimensional random vector and its projection onto the span of a set of orthonormal vectors . Conditions on the distribution of are given such that if is chosen according to Haar measure on the Stiefel manifold, the bounded-Lipschitz distance from to a Gaussian distribution is concentrated at its expectation; furthermore, an explicit bound is given for the expected distance, in terms of , , and the distribution of , allowing consideration not just of fixed but of growing with . The results are applied in the setting of projection pursuit, showing that most -dimensional projections of data points in are close to Gaussian, when and are large and for a small constant .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPoint processes and geometric inequalities · Markov Chains and Monte Carlo Methods · Bayesian Methods and Mixture Models
