An implementation of hybrid parallel CUDA code for the hyperonic nuclear forces
Hidekatsu Nemura, for HAL QCD Collaboration

TL;DR
This paper details the development and performance evaluation of a hybrid parallel CUDA implementation for calculating baryon interactions via lattice QCD, utilizing multi-GPU systems with MPI and OpenMP for efficient computation.
Contribution
It introduces a novel multi-GPU CUDA implementation for calculating NBS wave functions in baryon interactions, optimizing parallelization across time and spatial dimensions.
Findings
Achieved efficient multi-GPU parallelization with MPI and OpenMP.
Measured strong and weak scaling on HA-PACS supercomputer.
Identified performance differences between NVIDIA M2090 and K20X GPUs.
Abstract
We present our recent effort to develop a GPGPU program to calculate 52 channels of the Nambu-Bethe-Salpeter (NBS) wave functions in order to study the baryon interactions, from nucleon-nucleon to , from lattice QCD. We adopt CUDA programming to perform the multi-GPU execution on a hybrid parallel programming with MPI and OpenMP. Effective baryon block algorithm is briefly outlined, which calculates efficaciously a large number of NBS wave functions at a time, and three CUDA kernel programs are implemented to materialize the effective baryon block algorithm using GPUs on the single-program multiple-data (SPMD) programming model. In order to parallelize multiple GPUs, we take both two approaches by dividing the time dimension and by dividing the spatial dimensions. Performances are measured using HA-PACS supercomputer in University of Tsukuba, which includes NVIDIA M2090 and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
