StreamTGN: A GPU-Efficient Serving System for Streaming Temporal Graph Neural Networks
Lingling Zhang, Pengpeng Qiao, Zhiwei Zhang, Ye Yuan, Guoren Wang

TL;DR
StreamTGN is a GPU-efficient inference system for streaming temporal graph neural networks that leverages locality to significantly accelerate inference without accuracy loss.
Contribution
It introduces a novel streaming inference system for TGNs that exploits local updates, reducing complexity and increasing throughput while maintaining accuracy.
Findings
Achieves up to 739x speedup over existing systems.
Maintains identical accuracy despite acceleration.
Combining with training optimizations yields 24x end-to-end speedup.
Abstract
Temporal Graph Neural Networks (TGNs) achieve state-of-the-art performance on dynamic graph tasks, yet existing systems focus exclusively on accelerating training -- at inference time, every new edge triggers embedding updates even though only a small fraction of nodes are affected. We present \textbf{StreamTGN}, the first streaming TGN inference system exploiting the inherent locality of temporal graph updates: in an -layer TGN, a new edge affects only nodes within hops of the endpoints, typically less than 0.2\% on million-node graphs. StreamTGN maintains persistent GPU-resident node memory and uses dirty-flag propagation to identify the affected set , reducing per-batch complexity from to with zero accuracy loss. Drift-aware adaptive rebuild scheduling and batched streaming with relaxed ordering further maximize throughput.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Multimodal Machine Learning Applications
