Accelerating QDP++ using GPUs

Frank Winter

arXiv:1105.2279·hep-lat·May 12, 2011

Accelerating QDP++ using GPUs

Frank Winter

PDF

TL;DR

This paper demonstrates how to accelerate the QDP++ library for quantum field theory computations on GPUs using CUDA, expression templates, and JIT compilation, resulting in significant speed-ups.

Contribution

It introduces a novel method to accelerate QDP++ expression evaluation on GPUs by leveraging CUDA, ET, and JIT, enabling GPU execution of routines not previously optimized.

Findings

01

Significant speed-up of QDP++ routines on GPU

02

Successful implementation of GPU-accelerated smearing routine

03

Effective bridging between host and device memory domains

Abstract

Graphic Processing Units (GPUs) are getting increasingly important as target architectures in scientific High Performance Computing (HPC). NVIDIA established CUDA as a parallel computing architecture controlling and making use of the compute power of GPUs. CUDA provides sufficient support for C++ language elements to enable the Expression Template (ET) technique in the device memory domain. QDP++ is a C++ vector class library suited for quantum field theory which provides vector data types and expressions and forms the basis of the lattice QCD software suite Chroma. In this work accelerating QDP++ expression evaluation to a GPU was successfully implemented leveraging the ET technique and using Just-In-Time (JIT) compilation. The Portable Expression Template Engine (PETE) and the C API for CUDA kernel arguments were used to build the bridge between host and device memory domains. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.