Guardian: Safe GPU Sharing in Multi-Tenant Environments
Manos Pavlidakis, Giorgos Vasiliadis, Stelios Mavridis, Anargyros, Argyros, Antony Chazapis, and Angelos Bilas

TL;DR
Guardian introduces a PTX-level bounds checking system that enables safe, dynamic GPU sharing among multiple tenants, significantly improving resource utilization while maintaining memory safety and low overhead.
Contribution
It presents a novel, transparent, PTX-level bounds checking approach that ensures memory safety in multi-tenant GPU sharing environments, supporting dynamic partitioning and real-world frameworks.
Findings
Overhead of 4%-12% compared to native execution
Supports frameworks like Caffe and PyTorch
Provides memory isolation and fault fencing
Abstract
Modern GPU applications, such as machine learning (ML), can only partially utilize GPUs, leading to GPU underutilization in cloud environments. Sharing GPUs across multiple applications from different tenants can improve resource utilization and consequently cost, energy, and power efficiency. However, GPU sharing creates memory safety concerns because kernels must share a single GPU address space. Existing spatial-sharing mechanisms either lack fault isolation for memory accesses or require static partitioning, which leads to limited deployability or low utilization. In this paper, we present Guardian, a PTX-level bounds checking approach that provides memory isolation and supports dynamic GPU spatial-sharing. Guardian relies on three mechanisms: (1) It divides the common GPU address space into separate partitions for different applications. (2) It intercepts and checks all GPU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Parallel Computing and Optimization Techniques · Cloud Data Security Solutions
