Mixed Precision Solver Scalable to 16000 MPI Processes for Lattice Quantum Chromodynamics Simulations on the Oakforest-PACS System
Taisuke Boku, Ken-Ichi Ishikawa, Yoshinobu Kuramashi, Lawrence, Meadows

TL;DR
This paper presents a highly scalable mixed-precision solver for lattice QCD simulations that efficiently utilizes 16,000 MPI processes on the Oakforest-PACS supercomputer, achieving significant computational performance.
Contribution
The authors developed and optimized a mixed-precision quark solver for large-scale lattice QCD simulations on the Oakforest-PACS system, enabling unprecedented scalability and performance.
Findings
Achieved 2.6 PFLOPS in single-precision on a 400^3×800 lattice.
Successfully scaled the solver to 16,000 MPI processes across 8,000 nodes.
Implemented advanced optimization techniques including communication-computation overlap and SIMD vectorization.
Abstract
Lattice Quantum Chromodynamics (Lattice QCD) is a quantum field theory on a finite discretized space-time box so as to numerically compute the dynamics of quarks and gluons to explore the nature of subatomic world. Solving the equation of motion of quarks (quark solver) is the most compute-intensive part of the lattice QCD simulations and is one of the legacy HPC applications. We have developed a mixed-precision quark solver for a large Intel Xeon Phi (KNL) system named "Oakforest-PACS", employing the -improved Wilson quarks as the discretized equation of motion. The nested-BiCGSTab algorithm for the solver was implemented and optimized using mixed-precision, communication-computation overlapping with MPI-offloading, SIMD vectorization, and thread stealing techniques. The solver achieved 2.6 PFLOPS in the single-precision part on a lattice using 16000 MPI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Algorithms and Data Compression · Distributed and Parallel Computing Systems
