Object-oriented implementation of algebraic multi-grid solver for lattice QCD on SIMD architectures and GPU clusters
Issaku Kanamori, Ken-Ichi Ishikawa, Hideo Matsufuru

TL;DR
This paper presents a portable, object-oriented algebraic multi-grid solver for lattice QCD that performs efficiently across diverse architectures like Intel Xeon Phi, Fujitsu A64FX, and NVIDIA Tesla V100, demonstrating good scalability and outperforming some existing solvers.
Contribution
The work introduces a portable, architecture-specific implementation of an algebraic multi-grid solver for lattice QCD using object-oriented programming, enabling high performance across multiple HPC architectures.
Findings
Reasonable scaling behavior observed across architectures
Better performance than mixed precision BiCGStab solvers
Architecture-specific tuning improves performance
Abstract
A portable implementation of elaborated algorithm is important to use variety of architectures in HPC applications. In this work we implement and benchmark an algebraic multi-grid solver for Lattice QCD on three different architectures, Intel Xeon Phi, Fujitsu A64FX, and NVIDIA Tesla V100, in keeping high performance and portability of the code based on the object-oriented paradigm. Some parts of code are specific to an architecture employing appropriate data layout and tuned matrix-vector multiplication kernels, while the implementation of abstract solver algorithm is common to all architectures. Although the performance of the solver depends on tuning of the architecture-dependent part, we observe reasonable scaling behavior and better performance than the mixed precision BiCGSstab solvers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
