HISQ inverter on Intel Xeon Phi and NVIDIA GPUs

O. Kaczmarek; C. Schmidt; P. Steinbrecher; Swagato Mukherjee; and M.; Wagner

arXiv:1409.1510·cs.DC·September 5, 2014

HISQ inverter on Intel Xeon Phi and NVIDIA GPUs

O. Kaczmarek, C. Schmidt, P. Steinbrecher, Swagato Mukherjee, and M., Wagner

PDF

Open Access

TL;DR

This paper compares the performance of the HISQ inverter on Intel Xeon Phi and NVIDIA GPUs, demonstrating how exposing more parallelism significantly boosts inversion performance in Lattice QCD simulations.

Contribution

It provides a performance comparison and implementation insights for the HISQ inverter on both architectures, achieving over 250 GFlop/s by increasing parallelism.

Findings

01

Performance of 250 GFlop/s on both architectures

02

Doubling of inversion performance with increased parallelism

03

Implementation details and effort required for optimization

Abstract

The runtime of a Lattice QCD simulation is dominated by a small kernel, which calculates the product of a vector by a sparse matrix known as the "Dslash" operator. Therefore, this kernel is frequently optimized for various HPC architectures. In this contribution we compare the performance of the Intel Xeon Phi to current Kepler-based NVIDIA Tesla GPUs running a conjugate gradient solver. By exposing more parallelism to the accelerator through inverting multiple vectors at the same time we obtain a performance 250 GFlop/s on both architectures. This more than doubles the performance of the inversions. We give a short overview of both architectures, discuss some details of the implementation and the effort required to obtain the achieved performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Physics of Superconductivity and Magnetism