A random version of principal component analysis in data clustering

Luigi Leonardo Palese

arXiv:1610.08664·q-bio.QM·October 18, 2018

A random version of principal component analysis in data clustering

Luigi Leonardo Palese

PDF

TL;DR

This paper introduces a modified PCA algorithm that effectively handles both well-dimensioned and degenerated high-dimensional datasets, overcoming traditional mathematical constraints.

Contribution

A novel variation of PCA that extends applicability to degenerated datasets, addressing limitations of standard PCA in high-dimensional data analysis.

Findings

01

Modified PCA works on degenerated datasets

02

Algorithm maintains performance with fewer samples

03

Enhances PCA applicability in high-dimensional analysis

Abstract

Principal component analysis (PCA) is a widespread technique for data analysis that relies on the covariance-correlation matrix of the analyzed data. However to properly work with high-dimensional data, PCA poses severe mathematical constraints on the minimum number of different replicates or samples that must be included in the analysis. Here we show that a modified algorithm works not only on well dimensioned datasets, but also on degenerated ones.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPrincipal Components Analysis