Subgraph Stationary Hardware-Software Inference Co-Design
Payman Behnam, Jianming Tong, Alind Khare, Yangyu Chen, Yue Pan,, Pranav Gadikar, Abhimanyu Rajeshkumar Bambhaniya, Tushar Krishna, Alexey, Tumanov

TL;DR
This paper introduces SubGraph Stationary (SGS), a hardware-software co-design that dynamically optimizes ML inference serving by exploiting temporal locality, leading to significant latency and energy efficiency improvements.
Contribution
It proposes a novel SGS optimization for dynamic ML inference, integrated into a real hardware-software stack, enabling adaptive query serving with improved latency and energy savings.
Findings
Up to 25% latency reduction
0.98% increase in served accuracy
78.7% off-chip energy savings
Abstract
A growing number of applications depend on Machine Learning (ML) functionality and benefits from both higher quality ML predictions and better timeliness (latency) at the same time. A growing body of research in computer architecture, ML, and systems software literature focuses on reaching better latency-accuracy tradeoffs for ML models. Efforts include compression, quantization, pruning, early-exit models, mixed DNN precision, as well as ML inference accelerator designs that minimize latency and energy, while preserving delivered accuracy. All of them, however, yield improvements for a single static point in the latency-accuracy tradeoff space. We make a case for applications that operate in dynamically changing deployment scenarios, where no single static point is optimal. We draw on a recently proposed weight-shared SuperNet mechanism to enable serving a stream of queries that uses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Advanced Neural Network Applications
