Distributed Out-of-Memory SVD on CPU/GPU Architectures

Ismael Boureima; Manish Bhattarai; Maksim E. Eren; Nick Solovyev,; Hristo Djidjev; Boian S. Alexandrov

arXiv:2208.08410·cs.DC·August 18, 2022

Distributed Out-of-Memory SVD on CPU/GPU Architectures

Ismael Boureima, Manish Bhattarai, Maksim E. Eren, Nick Solovyev,, Hristo Djidjev, Boian S. Alexandrov

PDF

Open Access

TL;DR

This paper introduces a scalable, distributed out-of-memory SVD implementation for heterogeneous CPU/GPU systems, optimizing memory use and communication to handle extremely large matrices efficiently.

Contribution

It presents a novel out-of-memory SVD method based on the power method, optimized for large-scale, sparse, and dense matrices on HPC systems with CPU and GPU architectures.

Findings

01

Successfully decomposed 1TB dense matrices.

02

Decomposed 128PB sparse matrices with 1e-6 sparsity.

03

Achieved scalable performance on heterogeneous HPC systems.

Abstract

We propose an efficient, distributed, out-of-memory implementation of the truncated singular value decomposition (t-SVD) for heterogeneous (CPU+GPU) high performance computing (HPC) systems. Various implementations of SVD have been proposed, but most only estimate the singular values as an estimation of the singular vectors which can significantly increase the time and memory complexity of the algorithm. In this work, we propose an implementation of SVD based on the power method, which is a truncated singular values and singular vectors estimation method. Memory utilization bottlenecks seen in the power method are typically associated with the computation of the Gram matrix $\mat A^{T} \mat A$ , which can be significant when $\mat A$ is large and dense, or when $\mat A$ is super-large and sparse. The proposed implementation is optimized for out-of-memory problems where the memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMatrix Theory and Algorithms · Stochastic Gradient Optimization Techniques · Tensor decomposition and applications