Automatic BLAS Offloading on Unified Memory Architecture: A Study on   NVIDIA Grace-Hopper

Junjie Li; Yinzhi Wang; Xiao Liang; Hang Liu

arXiv:2404.13195·cs.DC·April 16, 2025

Automatic BLAS Offloading on Unified Memory Architecture: A Study on NVIDIA Grace-Hopper

Junjie Li, Yinzhi Wang, Xiao Liang, Hang Liu

PDF

1 Repo

TL;DR

This paper presents a new tool for automatic BLAS offloading on NVIDIA Grace-Hopper's unified memory architecture, enabling high-performance GPU acceleration without code modifications, demonstrated on quantum chemistry applications.

Contribution

Introduction of a novel tool that leverages Grace-Hopper's unified memory and NVLink C2C for automatic BLAS offloading without code changes.

Findings

01

Significant performance improvements on quantum chemistry codes.

02

Effective GPU offloading enabled by unified memory architecture.

03

No code modifications required for offloading.

Abstract

Porting codes to GPU often requires major efforts. While several tools exist for automatically offload numerical libraries such as BLAS and LAPACK, they often prove impractical due to the high cost of mandatory data transfer. The new unified memory architecture in NVIDIA Grace-Hopper allows high bandwidth cache-coherent memory access of all memory from both CPU and GPU, potentially eliminating bottleneck faced in conventional architecture. This breakthrough opens up new avenues for application development and porting strategies. In this study, we introduce a new tool for automatic BLAS offload, the tool leverages the high speed cache coherent NVLink C2C interconnect in Grace-Hopper, and enables performant GPU offload for BLAS heavy applications with no code changes or recompilation. The tool was tested on two quantum chemistry or physics codes, great performance benefits were observed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nicejunjie/scilib-accel
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.