On the explanatory power of principal components

Daniel A. Diaz-Pachon; J. Sunil Rao; Jean-Eudes Dazard

arXiv:1404.4917·math.PR·April 22, 2014·1 cites

On the explanatory power of principal components

Daniel A. Diaz-Pachon, J. Sunil Rao, Jean-Eudes Dazard

PDF

Open Access 1 Repo

TL;DR

This paper investigates the probabilistic behavior of vectors relative to orthogonal bases in high-dimensional spaces and discusses implications for Principal Components Analysis in regression and learning contexts.

Contribution

It provides a probabilistic analysis of vector proximity in high-dimensional orthogonal bases and explores its implications for PCA's explanatory power.

Findings

01

Probability of a vector being closer to all basis vectors than other vectors approaches 1/2 as dimension increases.

02

Distribution of the vector's proximity converges to a normal distribution on [-1,1] with increasing dimension.

03

Results have significant implications for PCA in regression and learning settings.

Abstract

We show that if we have an orthogonal base ( $u_{1}, \dots, u_{p}$ ) in a $p$ -dimensional vector space, and select $p + 1$ vectors $v_{1}, \dots, v_{p}$ and $w$ such that the vectors traverse the origin, then the probability of $w$ being to closer to all the vectors in the base than to $v_{1}, \dots, v_{p}$ is at least 1/2 and converges as $p$ increases to infinity to a normal distribution on the interval [-1,1]; i.e., $Φ (1) - Φ (- 1) \approx 0.6826$ . This result has relevant consequences for Principal Components Analysis in the context of regression and other learning settings, if we take the orthogonal base as the direction of the principal components.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jedazard/PRIMsrc
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Control Systems and Identification · Statistical Methods and Inference