Optimizing the domain wall fermion Dirac operator using the R-Stream source-to-source compiler
Meifeng Lin, Eric Papenhausen, M. Harper Langston, Benoit Meister,, Muthu Baskaran, Taku Izubuchi, Chulwoo Jung

TL;DR
This paper discusses optimizing the computationally intensive domain wall fermion Dirac operator in lattice QCD simulations using the R-Stream compiler, achieving performance improvements on Intel PC clusters.
Contribution
It introduces an optimization approach for the Dirac operator in lattice QCD using source-to-source compilation with R-Stream, demonstrating initial performance gains.
Findings
Preliminary benchmarks show performance improvements.
Optimization strategies were developed before and after code generation.
The approach is applicable to Intel PC clusters.
Abstract
The application of the Dirac operator on a spinor field, the Dslash operation, is the most computation-intensive part of the lattice QCD simulations. It is often the key kernel to optimize to achieve maximum performance on various platforms. Here we report on a project to optimize the domain wall fermion Dirac operator in Columbia Physics System (CPS) using the R-Stream source-to-source compiler. Our initial target platform is the Intel PC clusters. We discuss the optimization strategies involved before and after the automatic code generation with R-Stream and present some preliminary benchmark results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Scientific Computing and Data Management
