Performance Isolation and Semantic Determinism in Efficient GPU Spatial Sharing
Zhenyuan Yang, Wenxin Zheng, Mingyu Li, and Haibo Chen

TL;DR
CoGPU is a novel GPU spatial sharing system that achieves high resource utilization, strong performance isolation, and absolute semantic determinism through a new abstraction called GPU coroutine.
Contribution
It introduces GPU coroutine for decoupling logical and physical resources, enabling workload-aware scheduling without altering kernel semantics.
Findings
CoGPU improves training throughput by up to 79.2% over temporal sharing.
It reduces P99 inference tail latency by 15.1%.
A pluggable policy further reduces SLO violations by 21.2%.
Abstract
Existing GPU spatial sharing systems face a three-way tradeoff: resource utilization, performance isolation, and semantic determinism. Hardware partitioning suffers from hardware under-utilization. Hardware multiplexing fails to avoid performance interference. Recently proposed software-based GPU kernel slicing reshapes floating-point reduction orders, destroying semantic determinism and inducing catastrophic token drift in generative models. We present CoGPU, a transparent spatial sharing system that resolves this trilemma. CoGPU introduces \emph{GPU coroutine}, a novel abstraction that enables logical-to-physical resource decoupling. By dynamically mapping immutable virtual contexts to mutable physical resource via lightweight cooperative migration, CoGPU enables extensible, workload-aware scheduling without altering kernel semantics. Evaluations demonstrate CoGPU simultaneously…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
