Pinpoint resource allocation for GPU batch applications

Tim Voigtl\"ander; Manuel Giffels; G\"unter Quast; Matthias Schnepf; Roger Wolf

arXiv:2505.08562·hep-ex·May 14, 2025

Pinpoint resource allocation for GPU batch applications

Tim Voigtl\"ander, Manuel Giffels, G\"unter Quast, Matthias Schnepf, Roger Wolf

PDF

TL;DR

This paper investigates resource allocation strategies for GPU batch applications in high energy physics, focusing on optimizing throughput and energy efficiency for low-intensity GPU workloads using NVIDIA's MPS and batch system integration.

Contribution

It introduces a flexible resource allocation approach combining NVIDIA's MPS with batch systems, improving efficiency for diverse GPU workloads in HEP.

Findings

01

NVIDIA's MPS enhances GPU resource utilization.

02

The approach improves throughput for low-intensity workloads.

03

Energy efficiency is increased with optimized resource sharing.

Abstract

With the increasing usage of Machine Learning (ML) in High energy physics (HEP), there is a variety of new analyses with a large spread in compute resource requirements, especially when it comes to GPU resources. For institutes, like the Karlsruhe Institute of Technology (KIT), that provide GPU compute resources to HEP via their batch systems or the Grid, a high throughput, as well as energy efficient usage of their systems is essential. With low intensity GPU analyses specifically, inefficiencies are created by the standard scheduling, as resources are over-assigned to such workflows. An approach that is flexible enough to cover the entire spectrum, from multi-process per GPU, to multi-GPU per process, is necessary. As a follow-up to the techniques presented at ACAT 2022, this time we study NVIDIA's Multi-Process Service (MPS), its ability to securely distribute device memory and its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.