Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?
Herv\'e Cardot, David Degras

TL;DR
This paper compares various online PCA algorithms in terms of accuracy, speed, and memory, providing practical guidance for choosing the right method in high-dimensional data scenarios.
Contribution
It offers a comprehensive comparison of online PCA algorithms, including perturbation, incremental, and stochastic methods, with practical recommendations based on empirical evaluation.
Findings
Incremental methods balance accuracy and computational efficiency.
Stochastic optimization is suitable for large-scale data.
Extensions to missing and functional data are discussed.
Abstract
In the current context of data explosion, online techniques that do not require storing all data in memory are indispensable to routinely perform tasks like principal component analysis (PCA). Recursive algorithms that update the PCA with each new observation have been studied in various fields of research and found wide applications in industrial monitoring, computer vision, astronomy, and latent semantic indexing, among others. This work provides guidance for selecting an online PCA algorithm in practice. We present the main approaches to online PCA, namely, perturbation techniques, incremental methods, and stochastic optimization, and compare their statistical accuracy, computation time, and memory requirements using artificial and real data. Extensions to missing data and to functional data are discussed. All studied algorithms are available in the R package onlinePCA on CRAN.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPrincipal Components Analysis
