Optimization of the Brillouin operator on the KNL architecture

Stephan Durr

arXiv:1709.01828·hep-lat·July 10, 2018·1 cites

Optimization of the Brillouin operator on the KNL architecture

Stephan Durr

PDF

Open Access

TL;DR

This paper reports on optimizing the Brillouin operator's matrix-vector multiplication on Intel KNL architecture, achieving high performance with minimal adjustments, and also discusses results on Intel Core i7 and Wilson fermion matrices.

Contribution

It presents an effective optimization approach for the Brillouin operator on KNL architecture, demonstrating high performance without extensive memory layout changes.

Findings

01

Achieved 360 Gflop/s in single precision on KNL

02

Achieved 270 Gflop/s in double precision on KNL

03

Routine performs well on Intel Core i7 architectures

Abstract

Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with N_c=3 colors, N_v=12 right-hand-sides, N_{thr}=256 threads, on lattices of size 32^3*64, using exclusively OMP pragmas. Interestingly, the same routine performs quite well on Intel Core i7 architectures, too. Some observations on the much harder Wilson fermion matrix-times-vector optimization problem are added.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Magneto-Optical Properties and Applications · Advanced Electrical Measurement Techniques