Peformance Isolation for Inference Processes in Edge GPU Systems

Juan Jos\'e Mart\'in; Jos\'e Flich; Carles Hern\'andez

arXiv:2601.07600·cs.OS·January 28, 2026

Peformance Isolation for Inference Processes in Edge GPU Systems

Juan Jos\'e Mart\'in, Jos\'e Flich, Carles Hern\'andez

PDF

Open Access

TL;DR

This paper evaluates GPU isolation mechanisms like MPS, MIG, and Green Contexts on NVIDIA platforms to ensure predictable inference times for safety-critical deep learning applications, highlighting strengths and limitations.

Contribution

It provides a comprehensive analysis of GPU partitioning and isolation techniques, comparing their effectiveness and proposing future research directions for improved predictability.

Findings

01

MIG offers high isolation performance.

02

Green Contexts enable fine-grained SM allocation with low overhead.

03

Current limitations in temporal predictability are identified.

Abstract

This work analyzes the main isolation mechanisms available in modern NVIDIA GPUs: MPS, MIG, and the recent Green Contexts, to ensure predictable inference time in safety-critical applications using deep learning models. The experimental methodology includes performance tests, evaluation of partitioning impact, and analysis of temporal isolation between processes, considering both the NVIDIA A100 and Jetson Orin platforms. It is observed that MIG provides a high level of isolation. At the same time, Green Contexts represent a promising alternative for edge devices by enabling fine-grained SM allocation with low overhead, albeit without memory isolation. The study also identifies current limitations and outlines potential research directions to improve temporal predictability in shared GPUs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Radiation Effects in Electronics