Accelerating QDP++/Chroma on GPUs
Frank Winter

TL;DR
This paper presents a GPU acceleration framework for the QDP++/Chroma lattice QCD software, utilizing JIT compilation and automated memory management to enhance performance of expression evaluation and routines.
Contribution
It introduces GPU off-loading techniques with JIT compilation and automated memory management for QDP++/Chroma, improving performance of lattice QCD computations.
Findings
Accelerates non-kernel routines in lattice QCD calculations.
Demonstrates interoperability with Krylov space solvers.
Reduces performance bottlenecks from Amdahl's Law.
Abstract
Extensions to the C++ implementation of the QCD Data Parallel Interface are provided enabling acceleration of expression evaluation on NVIDIA GPUs. Single expressions are off-loaded to the device memory and execution domain leveraging the Portable Expression Template Engine and using Just-in-Time compilation techniques. Memory management is automated by a software implementation of a cache controlling the GPU's memory. Interoperability with existing Krylov space solvers is demonstrated and special attention is paid on 'Chroma readiness'. Non-kernel routines in lattice QCD calculations typically not subject of hand-tuned optimisations are accelerated which can reduce the effects otherwise suffered from Amdahl's Law.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Particle physics theoretical and experimental studies
