Implementation of the conjugate gradient algorithm for heterogeneous systems
Salvatore Cali, William Detmold, Grzegorz Korcyl, Piotr Korcyl, Phiala, Shanahan

TL;DR
This paper presents a portable implementation of the conjugate gradient algorithm for solving large sparse linear systems in lattice QCD, optimized for heterogeneous hardware including CPUs, GPUs, and FPGAs using SYCL/DPC++.
Contribution
It introduces a device-agnostic implementation of the CG algorithm leveraging SYCL/DPC++, enabling efficient computations across diverse hardware platforms.
Findings
Achieved cross-device compatibility with a single codebase.
Demonstrated performance on CPUs, GPUs, and FPGAs.
Facilitated resource-intensive lattice QCD calculations.
Abstract
Lattice QCD calculations require significant computational effort, with the dominant fraction of resources typically spent in the numerical inversion of the Dirac operator. One of the simplest methods to solve such large and sparse linear systems is the conjugate gradient (CG) approach. In this work we present an implementation of CG that can be executed on different devices, including CPUs, GPUs, and FPGAs. This is achieved by using the SYCL/DPC++ framework, which allows the execution of the same source code on heterogeneous systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParticle physics theoretical and experimental studies · Quantum Chromodynamics and Particle Interactions · Distributed and Parallel Computing Systems
