Multiscale principal component analysis

A. A. Akinduko; A. N. Gorban

arXiv:1307.8339·stat.ME·June 16, 2015

Multiscale principal component analysis

A. A. Akinduko, A. N. Gorban

PDF

TL;DR

This paper introduces Multiscale PCA, a flexible extension of traditional PCA that analyzes data structures at various scales by focusing on pairwise distances within specified intervals, improving outlier robustness.

Contribution

It proposes a novel multiscale approach to PCA that incorporates scale parameters and cluster analysis of projectors to reveal data structures at multiple levels.

Findings

01

Effectively reveals multiscale data structures.

02

Reduces influence of outliers in PCA.

03

Validated on artificial and real datasets.

Abstract

Principal component analysis (PCA) is an important tool in exploring data. The conventional approach to PCA leads to a solution which favours the structures with large variances. This is sensitive to outliers and could obfuscate interesting underlying structures. One of the equivalent definitions of PCA is that it seeks the subspaces that maximize the sum of squared pairwise distances between data projections. This definition opens up more flexibility in the analysis of principal components which is useful in enhancing PCA. In this paper we introduce scales into PCA by maximizing only the sum of pairwise distances between projections for pairs of datapoints with distances within a chosen interval of values [l,u]. The resulting principal component decompositions in Multiscale PCA depend on point (l,u) on the plane and for each point we define projectors onto principal components. Cluster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.