Efficient Replication of Queued Tasks for Latency Reduction in Cloud Systems
Gauri Joshi, Emina Soljanin, Gregory Wornell

TL;DR
This paper analyzes how different redundancy strategies in cloud systems impact latency and cost, revealing that the effectiveness depends on the log-concavity or convexity of service time distributions.
Contribution
It provides a theoretical framework for optimizing redundancy strategies based on the service time distribution's properties, balancing latency and cost.
Findings
Log-concavity of service time distribution influences redundancy effectiveness.
Maximum redundancy reduces latency and cost for log-convex distributions.
Fewer replicas and early cancellation are optimal for log-concave distributions.
Abstract
In cloud computing systems, assigning a job to multiple servers and waiting for the earliest copy to finish is an effective method to combat the variability in response time of individual servers. Although adding redundant replicas always reduces service time, the total computing time spent per job may be higher, thus increasing waiting time in queue. The total time spent per job is also proportional to the cost of computing resources. We analyze how different redundancy strategies, for eg. number of replicas, and the time when they are issued and canceled, affect the latency and computing cost. We get the insight that the log-concavity of the service time distribution is a key factor in determining whether adding redundancy reduces latency and cost. If the service distribution is log-convex, then adding maximum redundancy reduces both latency and cost. And if it is log-concave, then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
