Performance-Portable Optimization and Analysis of Multiple Right-Hand Sides in a Lattice QCD Solver
Shiting Long, Gustavo Ramirez-Hidalgo, Stepan Nassyr, Jose Jimenez-Merchan, Andreas Frommer, Dirk Pleiter

TL;DR
This paper enhances a lattice QCD solver to efficiently handle multiple right-hand sides across different architectures, optimizing data layouts and SIMD utilization, and analyzing performance portability and architecture-specific benefits.
Contribution
It introduces a flexible data layout and SIMD optimization for multi-RHS lattice QCD solvers, along with a comprehensive performance analysis across architectures.
Findings
Achieved similar speedups on x86 and Arm clusters.
Demonstrated performance portability of the optimizations.
Provided insights into architecture-specific performance factors.
Abstract
Managing the high computational cost of iterative solvers for sparse linear systems is a known challenge in scientific computing. Moreover, scientific applications often face memory bandwidth constraints, making it critical to optimize data locality and enhance the efficiency of data transport. We extend the lattice QCD solver DD-AMG to incorporate multiple right-hand sides (rhs) for both the Wilson-Dirac operator evaluation and the GMRES solver, with and without odd-even preconditioning. To optimize auto-vectorization, we introduce a flexible interface that supports various data layouts and implement a new data layout for better SIMD utilization. We evaluate our optimizations on both x86 and Arm clusters, demonstrating performance portability with similar speedups. A key contribution of this work is the performance analysis of our optimizations, which reveals the complexity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum Computing Algorithms and Architecture · Parallel Computing and Optimization Techniques · Quantum Chromodynamics and Particle Interactions
