Microarchitectural comparison and in-core modeling of state-of-the-art CPUs: Grace, Sapphire Rapids, and Genoa
Jan Laukemann, Georg Hager, Gerhard Wellein

TL;DR
This paper compares the microarchitectures of AMD Zen 4, Intel Golden Cove, and Nvidia Neoverse V2 CPUs, developing an in-core performance model and analyzing features like write-allocate evasion to understand their performance characteristics.
Contribution
It extends the OSACA tool to model these CPUs and provides a detailed microarchitectural comparison including the analysis of write-allocate evasion mechanisms.
Findings
Grace Superchip has a near-optimal write-allocate evasion implementation.
Zen 4 requires explicit non-temporal stores to avoid write allocates.
The in-core performance model accurately predicts CPU behavior.
Abstract
With Nvidia's release of the Grace Superchip, all three big semiconductor companies in HPC (AMD, Intel, Nvidia) are currently competing in the race for the best CPU. In this work we analyze the performance of these state-of-the-art CPUs and create an accurate in-core performance model for their microarchitectures Zen 4, Golden Cove, and Neoverse V2, extending the Open Source Architecture Code Analyzer (OSACA) tool and comparing it with LLVM-MCA. Starting from the peculiarities and up- and downsides of a single core, we extend our comparison by a variety of microbenchmarks and the capabilities of a full node. The "write-allocate (WA) evasion" feature, which can automatically reduce the memory traffic caused by write misses, receives special attention; we show that the Grace Superchip has a next-to-optimal implementation of WA evasion, and that the only way to avoid write allocates on Zen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques
