Distributed Out-of-Memory SVD on CPU/GPU Architectures
Ismael Boureima, Manish Bhattarai, Maksim E. Eren, Nick Solovyev,, Hristo Djidjev, Boian S. Alexandrov

TL;DR
This paper introduces a scalable, distributed out-of-memory SVD implementation for heterogeneous CPU/GPU systems, optimizing memory use and communication to handle extremely large matrices efficiently.
Contribution
It presents a novel out-of-memory SVD method based on the power method, optimized for large-scale, sparse, and dense matrices on HPC systems with CPU and GPU architectures.
Findings
Successfully decomposed 1TB dense matrices.
Decomposed 128PB sparse matrices with 1e-6 sparsity.
Achieved scalable performance on heterogeneous HPC systems.
Abstract
We propose an efficient, distributed, out-of-memory implementation of the truncated singular value decomposition (t-SVD) for heterogeneous (CPU+GPU) high performance computing (HPC) systems. Various implementations of SVD have been proposed, but most only estimate the singular values as an estimation of the singular vectors which can significantly increase the time and memory complexity of the algorithm. In this work, we propose an implementation of SVD based on the power method, which is a truncated singular values and singular vectors estimation method. Memory utilization bottlenecks seen in the power method are typically associated with the computation of the Gram matrix , which can be significant when is large and dense, or when is super-large and sparse. The proposed implementation is optimized for out-of-memory problems where the memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Stochastic Gradient Optimization Techniques · Tensor decomposition and applications
