Accelerating Twisted Mass LQCD with QPhiX
Mario Schr\"ock, Silvano Simula, Alexei Strelchenko

TL;DR
This paper details the implementation and performance analysis of twisted mass fermion operators in the QPhiX library, demonstrating significant speedups on Intel Xeon Phi and Haswell CPUs for lattice QCD computations.
Contribution
The paper introduces optimized twisted mass fermion operators in QPhiX and demonstrates substantial performance improvements on modern CPU architectures.
Findings
Achieved 80% of peak bandwidth on Xeon Phi 7120P for Dslash kernel.
Code outperforms tmLQCD library by ~5x in single precision on Haswell CPUs.
Scaled code to 14.1 Tflops on 64 Xeon Haswell CPUs.
Abstract
We present the implementation of twisted mass fermion operators for the QPhiX library. We analyze the performance on the Intel Xeon Phi (Knights Corner) coprocessor as well as on Intel Xeon Haswell CPUs. In particular, we demonstrate that on the Xeon Phi 7120P the Dslash kernel is able to reach 80\% of the theoretical peak bandwidth, while on a Xeon Haswell E5-2630 CPU our generated code for the Dslash operator with AVX2 instructions outperforms the corresponding implementation in the tmLQCD library by a factor of in single precision. We strong scale the code up to 6.8 (14.1) Tflops in single (half) precision on 64 Xeon Haswell CPUs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum Chromodynamics and Particle Interactions · Particle physics theoretical and experimental studies · Black Holes and Theoretical Physics
