Fully-Automated Code Generation for Efficient Computation of Sparse   Matrix Permanents on GPUs

Deniz Elbek; Kamer Kaya

arXiv:2501.15126·cs.DC·January 28, 2025

Fully-Automated Code Generation for Efficient Computation of Sparse Matrix Permanents on GPUs

Deniz Elbek, Kamer Kaya

PDF

Open Access

TL;DR

This paper introduces fully-automated, GPU-optimized code generation techniques for efficiently computing the permanent of sparse matrices, leveraging register storage, matrix partitioning, and Gray code structures.

Contribution

It presents novel methods for storing arrays in registers, exploiting Gray code structures, and optimizing kernel generation for sparse matrix permanent computation on GPUs.

Findings

01

Achieves 31x speedup over CPU implementations on synthetic matrices.

02

Attains 8x speedup over traditional GPU methods on synthetic matrices.

03

Real-world matrices see 24.9x and 4.9x speedups, respectively.

Abstract

Registers are the fastest memory components within the GPU's complex memory hierarchy, accessed by names rather than addresses. They are managed entirely by the compiler through a process called register allocation, during which the compiler attempts to cache predictable data from thread-local memory into thread-private registers. Computing the permanent of a sparse matrix poses a challenge for compilers, as optimizing this process is hindered by the unpredictable distribution of nonzero elements, which only become known at runtime. In this work, we employ fully-automated code generation to address this, producing highly optimized kernels tailored to the matrix's sparsity pattern. State-of-the-art permanent computation algorithms require each thread to store a private array, denoted x, of size n. We first propose a technique that fully stores these arrays in registers, with inclusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDNA and Biological Computing