General purpose lattice QCD code set Bridge++ 2.0 for high performance computing
Yutaro Akahoshi, Sinya Aoki, Tatsumi Aoyama, Issaku Kanamori, Kazuyuki, Kanaya, Hideo Matsufuru, Yusuke Namekawa, Hidekatsu Nemura, Yusuke Taniguchi

TL;DR
Bridge++ 2.0 is a high-performance, portable lattice QCD simulation code that extends the original version to utilize modern processor architectures like Intel AVX-512, Arm A64FX, NEC SX-Aurora, and NVIDIA V100 GPUs.
Contribution
The paper introduces an extended version of Bridge++ that offers optimized code for various modern architectures, enhancing performance and portability for lattice QCD simulations.
Findings
Achieved high performance on multiple architectures
Demonstrated portability and extensibility of the code
Provided application examples on diverse hardware systems
Abstract
Bridge++ is a general-purpose code set for a numerical simulation of lattice QCD aiming at a readable, extensible, and portable code while keeping practically high performance. The previous version of Bridge++ is implemented in double precision with a fixed data layout. To exploit the high arithmetic capability of new processor architecture, we extend the Bridge++ code so that optimized code is available as a new branch, i.e., an alternative to the original code. This paper explains our strategy of implementation and displays application examples to the following architectures and systems: Intel AVX-512 on Xeon Phi Knights Landing, Arm A64FX-SVE on Fujitsu A64FX (Fugaku), NEC SX-Aurora TSUBASA, and GPU cluster with NVIDIA V100.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
