Data-parallel leading-order event generation in MadGraph5_aMC@NLO
Stephan Hageb\"ock, Daniele Massaro, Olivier Mattelaer, Stefan Roiser, Andrea Valassi, Zenny Wettersten

TL;DR
The paper introduces CUDACPP, a plugin for MadGraph5_aMC@NLO that accelerates event generation by leveraging SIMD instructions and GPU offloading, significantly improving performance for high-multiplicity QCD processes.
Contribution
It presents a novel data-parallel implementation of helicity amplitudes in C++ and CUDA, enabling substantial speed-ups in event generation on CPUs and GPUs.
Findings
Speed-up scales linearly with SIMD register size.
GPU offloading provides additional acceleration beyond SIMD.
High-multiplicity QCD processes are sped up by an order of magnitude.
Abstract
The CUDACPP plugin for MadGraph5_aMC@NLO aims to accelerate leading order tree-level event generation by providing the MadEvent event generator with data-parallel helicity amplitudes. These amplitudes are written in templated C++ and CUDA, allowing them to be compiled for CPUs supporting SSE4, AVX2, and AVX-512 instruction sets as well as CUDA- and HIP-enabled GPUs. Using SIMD instruction sets, CUDACPP-generated amplitude routines routines are shown to speed up linearly with SIMD register size, and GPU offloading is shown to provide acceleration beyond that of SIMD instructions. Additionally, the resulting speed-up in event generation perfectly aligns with predictions from measured runtime fractions spent in amplitude routines, and proper GPU utilisation can speed up high-multiplicity QCD processes by an order of magnitude when compared to optimal CPU usage in server-grade CPUs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
