Practical Implementation of Lattice QCD Simulation on Intel Xeon Phi Knights Landing
Issaku Kanamori, Hideo Matsufuru

TL;DR
This paper explores implementing lattice QCD simulations on Intel Xeon Phi Knights Landing, focusing on optimizing solver algorithms for SIMD architecture and parallel performance based on empirical measurements.
Contribution
It presents practical methods for optimizing lattice QCD code on KNL, including SIMD intrinsics and prefetching techniques, with performance tuning insights.
Findings
Optimized solver performance on KNL using SIMD intrinsics
Effective prefetching strategies for large sparse matrix operations
Performance tuning guidelines for SIMD and parallel architectures
Abstract
We investigate implementation of lattice Quantum Chromodynamics (QCD) code on the Intel Xeon Phi Knights Landing (KNL). The most time consuming part of the numerical simulations of lattice QCD is a solver of linear equation for a large sparse matrix that represents the strong interaction among quarks. To establish widely applicable prescriptions, we examine rather general methods for the SIMD architecture of KNL, such as using intrinsics and manual prefetching, to the matrix multiplication and iterative solver algorithms. Based on the performance measured on the Oakforest-PACS system, we discuss the performance tuning on KNL as well as the code design for facilitating such tuning on SIMD architecture and massively parallel machines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Algorithms and Data Compression
