Performance Isolation and Semantic Determinism in Efficient GPU Spatial Sharing

Zhenyuan Yang; Wenxin Zheng; Mingyu Li; and Haibo Chen

arXiv:2603.15042·cs.DC·April 6, 2026

Performance Isolation and Semantic Determinism in Efficient GPU Spatial Sharing

Zhenyuan Yang, Wenxin Zheng, Mingyu Li, and Haibo Chen

PDF

TL;DR

CoGPU is a novel GPU spatial sharing system that achieves high resource utilization, strong performance isolation, and absolute semantic determinism through a new abstraction called GPU coroutine.

Contribution

It introduces GPU coroutine for decoupling logical and physical resources, enabling workload-aware scheduling without altering kernel semantics.

Findings

01

CoGPU improves training throughput by up to 79.2% over temporal sharing.

02

It reduces P99 inference tail latency by 15.1%.

03

A pluggable policy further reduces SLO violations by 21.2%.

Abstract

Existing GPU spatial sharing systems face a three-way tradeoff: resource utilization, performance isolation, and semantic determinism. Hardware partitioning suffers from hardware under-utilization. Hardware multiplexing fails to avoid performance interference. Recently proposed software-based GPU kernel slicing reshapes floating-point reduction orders, destroying semantic determinism and inducing catastrophic token drift in generative models. We present CoGPU, a transparent spatial sharing system that resolves this trilemma. CoGPU introduces \emph{GPU coroutine}, a novel abstraction that enables logical-to-physical resource decoupling. By dynamically mapping immutable virtual contexts to mutable physical resource via lightweight cooperative migration, CoGPU enables extensible, workload-aware scheduling without altering kernel semantics. Evaluations demonstrate CoGPU simultaneously…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.