Three Dirac operators on two architectures with one piece of code and no   hassle

Stephan Durr

arXiv:1808.05506·hep-lat·November 21, 2018

Three Dirac operators on two architectures with one piece of code and no hassle

Stephan Durr

PDF

Open Access

TL;DR

This paper presents a straightforward implementation of three Dirac operator discretizations on two different computer architectures using high-level programming tools, achieving high performance without complex optimization.

Contribution

It introduces a simple, portable approach to implement multiple Dirac operators with high efficiency across architectures using high-level compiler directives.

Findings

01

Achieved up to 790 Gflop/s performance on KNL for the discretizations.

02

Implemented three discretizations with a unified high-level approach.

03

Demonstrated portability and high performance without cache-line tuning.

Abstract

A simple minded approach to implement three discretizations of the Dirac operator (staggered, Wilson, Brillouin) on two architectures (KNL and core i7) is presented. The idea is to use a high-level compiler along with OpenMP parallelization and SIMD pragmas, but to stay away from cache-line optimization and/or assembly-tuning. The implementation is for N_v right-hand-sides, and this extra index is used to fill the SIMD pipeline. On one KNL node single precision performance figures for N_c=3, N_v=12 read 475 Gflop/s, 345 Gflop/s, and 790 Gflop/s for the three discretization schemes, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging Techniques and Applications · Quantum Chromodynamics and Particle Interactions · Particle physics theoretical and experimental studies