Distributed Out-of-Memory NMF on CPU/GPU Architectures
Ismael Boureima, Manish Bhattarai, Maksim Eren, Erik Skau, Philip, Romero, Stephan Eidenbenz, Boian Alexandrov

TL;DR
This paper presents a scalable distributed out-of-memory NMF implementation for CPU/GPU systems, enabling efficient factorization of extremely large matrices by optimizing memory use and communication.
Contribution
It extends NMFk to support dense and sparse matrices on multi-node, multi-GPU systems with out-of-memory capabilities and optimized communication strategies.
Findings
Achieved 32x to 76x speedup over CPU-based NMFk.
Demonstrated good weak scaling on up to 4096 multi-GPU nodes.
Successfully factorized matrices up to 11 Exabytes in size.
Abstract
We propose an efficient distributed out-of-memory implementation of the Non-negative Matrix Factorization (NMF) algorithm for heterogeneous high-performance-computing (HPC) systems. The proposed implementation is based on prior work on NMFk, which can perform automatic model selection and extract latent variables and patterns from data. In this work, we extend NMFk by adding support for dense and sparse matrix operation on multi-node, multi-GPU systems. The resulting algorithm is optimized for out-of-memory (OOM) problems where the memory required to factorize a given matrix is greater than the available GPU memory. Memory complexity is reduced by batching/tiling strategies, and sparse and dense matrix operations are significantly accelerated with GPU cores (or tensor cores when available). Input/Output (I/O) latency associated with batch copies between host and device is hidden using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies
