Performant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision

Evelyne Ringoot; Rabab Alomairy; Valentin Churavy; Alan Edelman

arXiv:2508.06339·cs.DC·August 11, 2025

Performant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision

Evelyne Ringoot, Rabab Alomairy, Valentin Churavy, Alan Edelman

PDF

TL;DR

This paper introduces a portable, GPU-accelerated SVD implementation in Julia that supports diverse hardware and data types, including Apple Metal GPUs and half precision, achieving high performance across platforms.

Contribution

It presents the first GPU-accelerated SVD supporting Apple Metal GPUs and half precision, with a unified, hardware-agnostic implementation in Julia.

Findings

01

Outperforms most linear algebra libraries for large matrices

02

Supports diverse GPU architectures and data types

03

Achieves 80%-90% of cuSOLVER performance on large matrices

Abstract

This paper presents a portable, GPU-accelerated implementation of a QR-based singular value computation algorithm in Julia. The singular value ecomposition (SVD) is a fundamental numerical tool in scientific computing and machine learning, providing optimal low-rank matrix approximations. Its importance has increased even more in large-scale machine learning pipelines, including large language models (LLMs), where it enables low-rank adaptation (LoRA). The implemented algorithm is based on the classic two-stage QR reduction, consisting of successive matrix reduction to band form and bidiagonal form. Our implementation leverages Julia's multiple dispatch and metaprogramming capabilities, integrating with the GPUArrays and KernelAbstractions frameworks to provide a unified type and hardware-agnostic function. It supports diverse GPU architectures and data types, and is, to our knowledge,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.