Accelerating Twisted Mass LQCD with QPhiX

Mario Schr\"ock; Silvano Simula; Alexei Strelchenko

arXiv:1510.08879·hep-lat·November 2, 2015·1 cites

Accelerating Twisted Mass LQCD with QPhiX

Mario Schr\"ock, Silvano Simula, Alexei Strelchenko

PDF

Open Access

TL;DR

This paper details the implementation and performance analysis of twisted mass fermion operators in the QPhiX library, demonstrating significant speedups on Intel Xeon Phi and Haswell CPUs for lattice QCD computations.

Contribution

The paper introduces optimized twisted mass fermion operators in QPhiX and demonstrates substantial performance improvements on modern CPU architectures.

Findings

01

Achieved 80% of peak bandwidth on Xeon Phi 7120P for Dslash kernel.

02

Code outperforms tmLQCD library by ~5x in single precision on Haswell CPUs.

03

Scaled code to 14.1 Tflops on 64 Xeon Haswell CPUs.

Abstract

We present the implementation of twisted mass fermion operators for the QPhiX library. We analyze the performance on the Intel Xeon Phi (Knights Corner) coprocessor as well as on Intel Xeon Haswell CPUs. In particular, we demonstrate that on the Xeon Phi 7120P the Dslash kernel is able to reach 80\% of the theoretical peak bandwidth, while on a Xeon Haswell E5-2630 CPU our generated code for the Dslash operator with AVX2 instructions outperforms the corresponding implementation in the tmLQCD library by a factor of $\sim 5 \times$ in single precision. We strong scale the code up to 6.8 (14.1) Tflops in single (half) precision on 64 Xeon Haswell CPUs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsQuantum Chromodynamics and Particle Interactions · Particle physics theoretical and experimental studies · Black Holes and Theoretical Physics