SU(2) Lattice Gauge Theory Simulations on Fermi GPUs
Nuno Cardoso, Pedro Bicudo

TL;DR
This paper evaluates CUDA GPU performance for SU(2) lattice gauge theory simulations, demonstrating significant speedups over CPU and analyzing precision impacts across architectures.
Contribution
It provides optimized GPU codes for SU(2) lattice simulations and compares performance across architectures and precisions, highlighting efficiency gains.
Findings
200x speedup with two Fermi GPUs over one CPU in single precision
Double precision computations are less than twice as slow as single precision on Fermi architecture
GPU implementations outperform CPU in lattice gauge theory simulations
Abstract
In this work we explore the performance of CUDA in quenched lattice SU(2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analyses with multiple GPUs and two different architectures (G200 and Fermi architectures) are also presented. In order to obtain a high performance, the code must be optimized for the GPU architecture, i.e., an implementation that exploits the memory hierarchy of the CUDA programming model. We produce codes for the Monte Carlo generation of SU(2) lattice gauge configurations, for the mean plaquette, for the Polyakov Loop at finite T and for the Wilson loop. We also present results for the potential using many configurations () without smearing and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
