Cell processor implementation of a MILC lattice QCD application

Guochun Shi (1); Volodymyr Kindratenko (1); Steven Gottlieb (2) ((1); University of Illinois; (2) Indiana University)

arXiv:0910.0262·hep-lat·September 8, 2016

Cell processor implementation of a MILC lattice QCD application

Guochun Shi (1), Volodymyr Kindratenko (1), Steven Gottlieb (2) ((1), University of Illinois, (2) Indiana University)

PDF

TL;DR

This paper reports on implementing a lattice QCD simulation on the Cell processor, highlighting performance bottlenecks due to memory bandwidth and demonstrating significant speedups over traditional CPUs despite limited kernel performance.

Contribution

First implementation of a MILC lattice QCD application on the Cell processor, analyzing performance bottlenecks and demonstrating notable speedups over standard CPUs.

Findings

01

Kernel performance limited by memory bandwidth.

02

Achieved up to 9.6x speedup on a single Cell processor.

03

Bandwidth utilization close to 78% of peak.

Abstract

We present results of the implementation of one MILC lattice QCD application-simulation with dynamical clover fermions using the hybrid-molecular dynamics R algorithm-on the Cell Broadband Engine processor. Fifty-four individual computational kernels responsible for 98.8% of the overall execution time were ported to the Cell's Synergistic Processing Elements (SPEs). The remaining application framework, including MPI-based distributed code execution, was left to the Cell's PowerPC processor. We observe that we only infrequently achieve more than 10 GFLOPS with any of the kernels, which is just over 4% of the Cell's peak performance. At the same time, many of the kernels are sustaining a bandwidth close to 20 GB/s, which is 78% of the Cell's peak. This indicates that the application performance is limited by the bandwidth between the main memory and the SPEs. In spite of this limitation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.