Efficient Redundancy Techniques for Latency Reduction in Cloud Systems
Gauri Joshi, Emina Soljanin, and Gregory Wornell

TL;DR
This paper analyzes how different redundancy strategies in cloud systems affect latency and cost, revealing that the effectiveness depends on the log-concavity or convexity of the service time distribution.
Contribution
It provides a comprehensive comparison of redundancy strategies based on service time distribution properties and proposes a general strategy for latency-cost optimization.
Findings
Log-convex distributions favor maximum redundancy for latency and cost reduction.
Log-concave distributions benefit from early cancellation and less redundancy.
The work extends analysis of fork-join queues to broader redundancy strategies.
Abstract
In cloud computing systems, assigning a task to multiple servers and waiting for the earliest copy to finish is an effective method to combat the variability in response time of individual servers, and reduce latency. But adding redundancy may result in higher cost of computing resources, as well as an increase in queueing delay due to higher traffic load. This work helps understand when and how redundancy gives a cost-efficient reduction in latency. For a general task service time distribution, we compare different redundancy strategies in terms of the number of redundant tasks, and time when they are issued and canceled. We get the insight that the log-concavity of the task service time creates a dichotomy of when adding redundancy helps. If the service time distribution is log-convex (i.e. log of the tail probability is convex) then adding maximum redundancy reduces both latency and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
