ARM SVE Unleashed: Performance and Insights Across HPC Applications on Nvidia Grace

Ruimin Shi; Gabin Schieffer; Maya Gokhale; Pei-Hung Lin; Hiren Patel; Ivy Peng

arXiv:2505.09462·cs.DC·May 15, 2025

ARM SVE Unleashed: Performance and Insights Across HPC Applications on Nvidia Grace

Ruimin Shi, Gabin Schieffer, Maya Gokhale, Pei-Hung Lin, Hiren Patel, Ivy Peng

PDF

Open Access

TL;DR

This paper evaluates the performance and potential of ARM's SVE vector extension on HPC applications using hardware metrics, models, and classification tools to understand its maturity and optimize its use.

Contribution

It introduces new metrics, an adapted roofline model, and a decision tree to analyze and improve SVE utilization in HPC workloads on ARM Grace.

Findings

01

SVE effectively reduces instructions and boosts performance.

02

New metrics quantify SVE efficiency and impact.

03

The decision tree aids in classifying application performance with SVE.

Abstract

Vector architectures are essential for boosting computing throughput. ARM provides SVE as the next-generation length-agnostic vector extension beyond traditional fixed-length SIMD. This work provides a first study of the maturity and readiness of exploiting ARM and SVE in HPC. Using selected performance hardware events on the ARM Grace processor and analytical models, we derive new metrics to quantify the effectiveness of exploiting SVE vectorization to reduce executed instructions and improve performance speedup. We further propose an adapted roofline model that combines vector length and data elements to identify potential performance bottlenecks. Finally, we propose a decision tree for classifying the SVE-boosted performance in applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques · Cloud Computing and Resource Management