Efficient GPU implementation of randomized SVD and its applications

{\L}ukasz Struski; Pawe{\l} Morkisz; Przemys{\l}aw Spurek; Samuel; Rodriguez Bernabeu; Tomasz Trzci\'nski

arXiv:2110.03423·cs.LG·March 13, 2024

Efficient GPU implementation of randomized SVD and its applications

{\L}ukasz Struski, Pawe{\l} Morkisz, Przemys{\l}aw Spurek, Samuel, Rodriguez Bernabeu, Tomasz Trzci\'nski

PDF

Open Access

TL;DR

This paper presents a GPU-optimized implementation of randomized SVD that leverages parallel matrix operations to significantly reduce computation time, enhancing efficiency in machine learning applications.

Contribution

The work reformulates randomized SVD to incorporate fast matrix multiplication on GPUs, enabling fully parallel processing and outperforming existing methods.

Findings

01

GPU implementation outperforms traditional CPU methods

02

Reformulated algorithm exploits BLAS-3 operations for speed

03

Results integrated into official CUDA library

Abstract

Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality reduction, data compression and deep learning algorithms. Typical solutions for matrix decompositions have polynomial complexity which significantly increases their computational cost and time. In this work, we leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs), predominant computing architecture used e.g. in deep learning, to reduce the computational burden of computing matrix decompositions. More specifically, we reformulate the randomized decomposition problem to incorporate fast matrix multiplication operations (BLAS-3) as building blocks. We show that this formulation, combined with fast random number generators, allows to fully exploit the potential of parallel processing implemented in GPUs. Our extensive evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Quantum Computing Algorithms and Architecture · Parallel Computing and Optimization Techniques