Performance Optimization of SU3_Bench on Xeon and Programmable Integrated Unified Memory Architecture
Jesmin Jahan Tithi, Fabio Checconi, Douglas Doerfler, Fabrizio Petrini

TL;DR
This paper explores optimizing the SU3_Bench microbenchmark across Xeon and Intel PIUMA architectures, addressing performance challenges and demonstrating a twofold performance boost on Xeon, while analyzing the impact of architecture-specific factors.
Contribution
It identifies key challenges in achieving peak performance of SU3_Bench on Xeon and PIUMA, and proposes optimization strategies tailored to each architecture's characteristics.
Findings
Performance on Xeon improved by 2x with optimizations.
Performance on PIUMA influenced more by pipeline throughput than bandwidth.
Comparison shows Xeon has about ten times more flops-per-byte than PIUMA.
Abstract
SU3\_Bench is a microbenchmark developed to explore performance portability across multiple programming models/methodologies using a simple, but nontrivial, mathematical kernel. This kernel has been derived from the MILC lattice quantum chromodynamics (LQCD) code. SU3\_Bench is bandwidth bound and generates regular compute and data access patterns. Therefore, on most traditional CPU and GPU-based systems, its performance is mainly determined by the achievable memory bandwidth. Although SU3\_Bench is a simple kernel, experience says its subtleties require a certain amount of tweaking to achieve peak performance for a given programming model and hardware, making performance portability challenging. In this paper, we share some of the challenges in obtaining the peak performance for SU3\_Bench on a state-of-the-art Intel Xeon machine, due to the nuances of variable definition, the nature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Cloud Computing and Resource Management
