Quiver: Supporting GPUs for Low-Latency, High-Throughput GNN Serving with Workload Awareness
Zeyuan Tan, Xiulong Yuan, Congjie He, Man-Kit Sit, Guo Li, Xiaoze Liu,, Baole Ai, Kai Zeng, Peter Pietzuch, Luo Mai

TL;DR
Quiver is a distributed GPU-based system that intelligently predicts and manages workload characteristics to optimize low-latency, high-throughput GNN inference serving, outperforming existing systems significantly.
Contribution
It introduces workload-aware metrics to guide GPU utilization for sampling and aggregation, enhancing GNN serving performance.
Findings
Achieves up to 35x lower latency compared to state-of-the-art
Attains 8x higher throughput than existing GNN systems
Effectively predicts computation workload to optimize GPU usage
Abstract
Systems for serving inference requests on graph neural networks (GNN) must combine low latency with high throughout, but they face irregular computation due to skew in the number of sampled graph nodes and aggregated GNN features. This makes it challenging to exploit GPUs effectively: using GPUs to sample only a few graph nodes yields lower performance than CPU-based sampling; and aggregating many features exhibits high data movement costs between GPUs and CPUs. Therefore, current GNN serving systems use CPUs for graph sampling and feature aggregation, limiting throughput. We describe Quiver, a distributed GPU-based GNN serving system with low-latency and high-throughput. Quiver's key idea is to exploit workload metrics for predicting the irregular computation of GNN requests, and governing the use of GPUs for graph sampling and feature aggregation: (1) for graph sampling, Quiver…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Stochastic Gradient Optimization Techniques · Ferroelectric and Negative Capacitance Devices
