Efficient, Out-of-Memory Sparse MTTKRP on Massively Parallel   Architectures

Andy Nguyen; Ahmed E. Helal; Fabio Checconi; Jan Laukemann; Jesmin; Jahan Tithi; Yongseok Soh; Teresa Ranadive; Fabrizio Petrini; Jee W. Choi

arXiv:2201.12523·cs.DC·June 29, 2022

Efficient, Out-of-Memory Sparse MTTKRP on Massively Parallel Architectures

Andy Nguyen, Ahmed E. Helal, Fabio Checconi, Jan Laukemann, Jesmin, Jahan Tithi, Yongseok Soh, Teresa Ranadive, Fabrizio Petrini, Jee W. Choi

PDF

1 Repo

TL;DR

This paper introduces a novel GPU framework for tensor decomposition that efficiently handles out-of-memory data, significantly accelerating computations and outperforming existing methods on real-world sparse tensors.

Contribution

The paper proposes the BLCO format and adaptive strategies for out-of-memory tensor operations, enabling efficient, conflict-resolving parallel computations on GPUs.

Findings

01

Achieves 2.12-2.6X speedup over state-of-the-art methods.

02

Supports out-of-memory tensor processing on GPUs.

03

Reduces synchronization costs and improves in-memory performance.

Abstract

Tensor decomposition (TD) is an important method for extracting latent information from high-dimensional (multi-modal) sparse data. This study presents a novel framework for accelerating fundamental TD operations on massively parallel GPU architectures. In contrast to prior work, the proposed Blocked Linearized Coordinate (BLCO) format enables efficient out-of-memory computation of tensor algorithms using a unified implementation that works on a single tensor copy. Our adaptive blocking and linearization strategies not only meet the resource constraints of GPU devices, but also accelerate data indexing, eliminate control-flow and memory-access irregularities, and reduce kernel launching overhead. To address the substantial synchronization cost on GPUs, we introduce an opportunistic conflict resolution algorithm, in which threads collaborate instead of contending on memory access to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jeewhanchoi/blocked-linearized-coordinate
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.