GPUnion: Autonomous GPU Sharing on Campus
Yufang Li (The Hong Kong University of Science, Technology (Guangzhou)), Yuanbo Zhang (Sun Yat-sen University), Hanlong Liao (Sun Yat-sen University), Deke Guo (Sun Yat-sen University), Guoming Tang (The Hong Kong University of Science, Technology (Guangzhou))

TL;DR
GPUnion is a decentralized platform that enables autonomous GPU sharing among campus labs, significantly improving utilization and flexibility without centralized control.
Contribution
It introduces a novel campus-scale GPU sharing system that maintains provider autonomy and supports voluntary participation with resilient execution mechanisms.
Findings
30% increase in GPU utilization
40% more interactive sessions
94% successful workload migrations
Abstract
A pronounced imbalance in GPU resources exists on campus, where some laboratories own underutilized servers while others lack the compute needed for AI research. GPU sharing can alleviate this disparity, while existing platforms typically rely on centralized oversight and persistent allocation models, conflicting with the voluntary and autonomous nature of academic resource ownership. We present GPUnion, a campus-scale GPU sharing platform enabling voluntary participation while preserving full provider autonomy. GPUnion incorporates three core mechanisms: i) container-based task dispatching and execution, ii) resource provider-first architecture, and iii) resilient execution featuring automatic check-pointing and migration. Case studies across multiple campus scenarios demonstrate 30% more GPU utilization improvement, 40% increase in interactive sessions, and 94% successful workload…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Cloud Computing and Resource Management
