STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices

Yongliang Ding; Qigong Bi; Peng Pu

arXiv:2604.26422·cs.LG·April 30, 2026

STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices

Yongliang Ding, Qigong Bi, Peng Pu

PDF

TL;DR

STLGT is a scalable, trace-based graph transformer model that accurately predicts tail latency in microservices while maintaining high inference efficiency, addressing dependency modeling and workload dynamics.

Contribution

The paper introduces STLGT, a novel linear graph transformer that encodes traces for efficient multi-step tail-latency forecasting in microservices.

Findings

01

STLGT improves forecasting accuracy by 8.5% MAPE over PERT-GNN.

02

STLGT achieves up to 12x faster CPU inference at N=32.

03

Ablation studies confirm the effectiveness of each component, especially under bursty traffic.

Abstract

Accurate end-to-end tail-latency forecasting is critical for proactive SLO management in microservice systems. However, modeling long-range dependency propagation and non-stationary, bursty workloads while maintaining inference efficiency at scale remains challenging. We present STLGT (Scalable Trace-based Linear Graph Transformer), a per-API predictor that encodes traces as span graphs for multi-step p95 tail-latency forecasting. STLGT uses a structure-aware linear graph Transformer to propagate cross-service dependencies with inference time linear in span graph size, and a decoupled temporal module to capture workload dynamics. Across a personalized education microservice application, DeathStarBench, and Alibaba traces, STLGT improves forecasting accuracy over PERT-GNN by 8.5% MAPE on average and achieves up to 12x faster CPU inference at N=32, matching the maximum span graph size…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.