Virgo: Cluster-level Matrix Unit Integration in GPUs for Scalability and Energy Efficiency
Hansung Kim, Ruohan Richard Yan, Joshua You, Tieliang Vamber Yang,, Yakun Sophia Shao

TL;DR
Virgo introduces a novel GPU microarchitecture that disaggregates matrix units from SIMT cores, enhancing scalability and energy efficiency for deep learning workloads by increasing operation granularity and reducing power consumption.
Contribution
The paper proposes Virgo, a GPU design that decouples matrix units from SIMT cores, enabling scalable, energy-efficient matrix operations at the cluster level.
Findings
Achieves 67.3% reduction in on-chip active power compared to Ampere-style cores.
Achieves 24.2% reduction in on-chip active power compared to Hopper-style cores.
Supports efficient concurrent execution of matrix units and SIMT cores.
Abstract
Modern GPUs incorporate specialized matrix units such as Tensor Cores to accelerate GEMM operations, which are central to deep learning workloads. However, existing matrix unit designs are tightly coupled to the SIMT core, restricting operation size due to register file capacity and bandwidth constraints. Such a limitation in scalability makes it difficult to simultaneously improve compute throughput and energy efficiency in GPUs. To address this challenge, we propose Virgo, a GPU microarchitecture that integrates dedicated matrix units at the SIMT core cluster level. By decoupling the matrix unit from the SIMT core, Virgo eliminates scalability constraints imposed by the core microarchitecture. Consequently, Virgo increases operation granularity at the hardware level, reducing energy overhead from core instruction processing. Physical disaggregation also enables a unified matrix unit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Interconnection Networks and Systems
