LatencyPrism: Online Non-intrusive Latency Sculpting for SLO-Guaranteed LLM Inference
Yin Du, Jiayi Ren, Xiayu Sun, Tianyao Zhou, Haizhu Zhou, Ruiyan Ma, Danyang Zhang

TL;DR
LatencyPrism is a zero-intrusion system that monitors and manages LLM inference latency in real-time across diverse platforms, ensuring SLO compliance without service disruption.
Contribution
It introduces the first non-intrusive, multi-platform latency sculpting system for LLM inference that guarantees SLO adherence and detects anomalies with high accuracy.
Findings
Achieved 0.98 F1-score in anomaly detection
Deployed across thousands of XPUs for over six months
Enabled real-time latency monitoring with millisecond alerts
Abstract
LLM inference latency critically determines user experience and operational costs, directly impacting throughput under SLO constraints. Even brief latency spikes degrade service quality despite acceptable average performance. However, distributed inference environments featuring diverse software frameworks and XPU architectures combined with dynamic workloads make latency analysis challenging. Constrained by intrusive designs that necessitate service restarts or even suspension, and by hardware-bound implementations that fail to adapt to heterogeneous inference environments, existing AI profiling methods are often inadequate for real-time production analysis. We present LatencyPrism, the first zero-intrusion multi-platform latency sculpting system. It aims to break down the inference latency across pipeline, proactively alert on inference latency anomalies, and guarantee adherence to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Parallel Computing and Optimization Techniques · Cloud Computing and Resource Management
