Generating SU(Nc) pure gauge lattice QCD configurations on GPUs with CUDA
Nuno Cardoso, Pedro Bicudo

TL;DR
This paper develops and optimizes CUDA-based GPU codes for generating SU(Nc) lattice QCD configurations, significantly accelerating the computational process for various Nc values, and provides publicly available tools for the community.
Contribution
It introduces optimized CUDA codes for SU(2), SU(3), and SU(4) lattice QCD configurations and a generic SU(Nc) implementation, enhancing computational efficiency on GPUs.
Findings
Optimized CUDA codes significantly improve performance.
Generic SU(Nc) code performs comparably to specialized versions.
Codes are publicly available for community use.
Abstract
The starting point of any lattice QCD computation is the generation of a Markov chain of gauge field configurations. Due to the large number of lattice links and due to the matrix multiplications, generating SU(Nc) lattice QCD configurations is a highly demanding computational task, requiring advanced computer parallel architectures such as clusters of several Central Processing Units (CPUs) or Graphics Processing Units (GPUs). In this paper we present and explore the performance of CUDA codes for NVIDIA GPUs to generate SU(Nc) lattice QCD pure gauge configurations. Our implementation in one GPU uses CUDA and in multiple GPUs uses OpenMP and CUDA. We present optimized CUDA codes SU(2), SU(3) and SU(4). We also show a generic SU(Nc) code for Nc and compare it with the optimized version of SU(4). Our codes are publicly available for free use by the lattice QCD community.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
