PCS: Predictive Component-level Scheduling for Reducing Tail Latency in Cloud Online Services
Rui Han, Junwei Wang, Siguang Huang, Chenrong Shao, Shulin Zhan,, Jianfeng Zhan, Jose Luis Vazquez-Poletti

TL;DR
PCS is a predictive, component-level scheduling framework designed to reduce tail latency in cloud online services by modeling component performance and dynamically allocating resources.
Contribution
It introduces an analytical performance model for predicting component and service latency, enabling adaptive scheduling to mitigate interference effects.
Findings
Reduces component tail latency by 67.05% on average.
Decreases overall service latency by 64.16%.
Outperforms existing tail latency reduction techniques.
Abstract
Modern latency-critical online services often rely on composing results from a large number of server components. Hence the tail latency (e.g. the 99th percentile of response time), rather than the average, of these components determines the overall service performance. When hosted on a cloud environment, the components of a service typically co-locate with short batch jobs to increase machine utilizations, and share and contend resources such as caches and I/O bandwidths with them. The highly dynamic nature of batch jobs in terms of their workload types and input sizes causes continuously changing performance interference to individual components, hence leading to their latency variability and high tail latency. However, existing techniques either ignore such fine-grained component latency variability when managing service performance, or rely on executing redundant requests to reduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
