An implementation of hybrid parallel CUDA code for the hyperonic nuclear   forces

Hidekatsu Nemura; for HAL QCD Collaboration

arXiv:1604.07983·hep-lat·June 28, 2016

An implementation of hybrid parallel CUDA code for the hyperonic nuclear forces

Hidekatsu Nemura, for HAL QCD Collaboration

PDF

TL;DR

This paper details the development and performance evaluation of a hybrid parallel CUDA implementation for calculating baryon interactions via lattice QCD, utilizing multi-GPU systems with MPI and OpenMP for efficient computation.

Contribution

It introduces a novel multi-GPU CUDA implementation for calculating NBS wave functions in baryon interactions, optimizing parallelization across time and spatial dimensions.

Findings

01

Achieved efficient multi-GPU parallelization with MPI and OpenMP.

02

Measured strong and weak scaling on HA-PACS supercomputer.

03

Identified performance differences between NVIDIA M2090 and K20X GPUs.

Abstract

We present our recent effort to develop a GPGPU program to calculate 52 channels of the Nambu-Bethe-Salpeter (NBS) wave functions in order to study the baryon interactions, from nucleon-nucleon to $Ξ - Ξ$ , from lattice QCD. We adopt CUDA programming to perform the multi-GPU execution on a hybrid parallel programming with MPI and OpenMP. Effective baryon block algorithm is briefly outlined, which calculates efficaciously a large number of NBS wave functions at a time, and three CUDA kernel programs are implemented to materialize the effective baryon block algorithm using GPUs on the single-program multiple-data (SPMD) programming model. In order to parallelize multiple GPUs, we take both two approaches by dividing the time dimension and by dividing the spatial dimensions. Performances are measured using HA-PACS supercomputer in University of Tsukuba, which includes NVIDIA M2090 and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.