STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices
Yongliang Ding, Qigong Bi, Peng Pu

TL;DR
STLGT is a scalable, trace-based graph transformer model that accurately predicts tail latency in microservices while maintaining high inference efficiency, addressing dependency modeling and workload dynamics.
Contribution
The paper introduces STLGT, a novel linear graph transformer that encodes traces for efficient multi-step tail-latency forecasting in microservices.
Findings
STLGT improves forecasting accuracy by 8.5% MAPE over PERT-GNN.
STLGT achieves up to 12x faster CPU inference at N=32.
Ablation studies confirm the effectiveness of each component, especially under bursty traffic.
Abstract
Accurate end-to-end tail-latency forecasting is critical for proactive SLO management in microservice systems. However, modeling long-range dependency propagation and non-stationary, bursty workloads while maintaining inference efficiency at scale remains challenging. We present STLGT (Scalable Trace-based Linear Graph Transformer), a per-API predictor that encodes traces as span graphs for multi-step p95 tail-latency forecasting. STLGT uses a structure-aware linear graph Transformer to propagate cross-service dependencies with inference time linear in span graph size, and a decoupled temporal module to capture workload dynamics. Across a personalized education microservice application, DeathStarBench, and Alibaba traces, STLGT improves forecasting accuracy over PERT-GNN by 8.5% MAPE on average and achieves up to 12x faster CPU inference at N=32, matching the maximum span graph size…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
