Parallel GPU Implementation of Iterative PCA Algorithms
M. Andrecut

TL;DR
This paper introduces a GPU-accelerated parallel implementation of PCA algorithms, including a new orthogonalization-based method, demonstrating significant speed improvements over CPU versions for large datasets.
Contribution
The paper presents a novel GS-PCA algorithm that overcomes NIPALS-PCA limitations and provides an efficient GPU parallel implementation for large-scale PCA computations.
Findings
GPU implementations are up to 12 times faster than CPU versions.
GS-PCA eliminates orthogonality loss in NIPALS-PCA.
Parallel GPU algorithms significantly improve large data PCA processing.
Abstract
Principal component analysis (PCA) is a key statistical technique for multivariate data analysis. For large data sets the common approach to PCA computation is based on the standard NIPALS-PCA algorithm, which unfortunately suffers from loss of orthogonality, and therefore its applicability is usually limited to the estimation of the first few components. Here we present an algorithm based on Gram-Schmidt orthogonalization (called GS-PCA), which eliminates this shortcoming of NIPALS-PCA. Also, we discuss the GPU (Graphics Processing Unit) parallel implementation of both NIPALS-PCA and GS-PCA algorithms. The numerical results show that the GPU parallel optimized versions, based on CUBLAS (NVIDIA) are substantially faster (up to 12 times) than the CPU optimized versions based on CBLAS (GNU Scientific Library).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Blind Source Separation Techniques · Spectroscopy and Chemometric Analyses
