Grid on QPACE 4
Peter Georg, Nils Meyer, Stefan Solbrig, Tilo Wettig

TL;DR
This paper discusses the deployment of QPACE 4 with 64 Fujitsu A64FX processors, focusing on porting the Grid LQCD framework to support ARM SVE, and evaluates its performance and data layout optimizations.
Contribution
The paper presents the first port of the Grid LQCD framework to ARM SVE and analyzes its performance on the QPACE 4 supercomputer.
Findings
Successful port of Grid to ARM SVE
Performance insights of Grid on QPACE 4
Advantages of alternative data layout for Domain Wall operator
Abstract
In 2020 we deployed QPACE 4, which features 64 Fujitsu A64FX model FX700 processors interconnected by InfiniBand EDR. QPACE 4 runs an open-source software stack. For Lattice QCD simulations we ported the Grid LQCD framework to support the ARM Scalable Vector Extension (SVE). In this contribution we discuss our SVE port of Grid, the status of SVE compilers and the performance of Grid. We also present the benefits of an alternative data layout of complex numbers for the Domain Wall operator.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Particle physics theoretical and experimental studies · Computational Physics and Python Applications
