Asymptotic Behavior of Total Times For Jobs That Must Start Over If a Failure Occurs
Soeren Asmussen, Pierre Fiorini, Lester Lipsky, Tomasz Rolski, Robert, Sheahan

TL;DR
This paper analyzes the asymptotic behavior of total completion times in systems where tasks restart after failures, providing relations between task and total time distributions and showing heavy-tailed properties under certain conditions.
Contribution
It derives tight asymptotic relations for total times in restart scenarios, extending understanding of failure impacts on task duration distributions.
Findings
Total time distribution is heavy-tailed if task times have unbounded support.
Asymptotic expressions for tail behavior are provided under various failure scenarios.
The analysis employs Cramér–Lundberg asymptotics, Tauberian theorems, and integral asymptotics.
Abstract
Many processes must complete in the presence of failures. Different systems respond to task failure in different ways. The system may resume a failed task from the failure point (or a saved checkpoint shortly before the failure point), it may give up on the task and select a replacement task from the ready queue, or it may restart the task. The behavior of systems under the first two scenarios is well documented, but the third ({\em RESTART}) has resisted detailed analysis. In this paper we derive tight asymptotic relations between the distribution of {\em task times} without failures to the {\em total time} when including failures, for any failure distribution. In particular, we show that if the task time distribution has an unbounded support then the total time distribution is always heavy-tailed. Asymptotic expressions are given for the tail of in various scenarios. The key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProbability and Risk Models · Advanced Queuing Theory Analysis · Insurance, Mortality, Demography, Risk Management
