Checkpointing strategies with prediction windows
Guillaume Aupy, Yves Robert, Fr\'ed\'eric Vivien, Dounia Zaidouni

TL;DR
This paper introduces a novel checkpointing strategy that optimally manages fault prediction windows, improving system efficiency through analytical modeling and simulation validation.
Contribution
It proposes a new approach using two periodic modes for checkpointing based on prediction window sizes, optimizing platform waste minimization.
Findings
Optimal checkpointing periods derived for various window sizes
Model validated through extensive simulations
Strategy reduces system waste effectively
Abstract
This paper deals with the impact of fault prediction techniques on checkpointing strategies. We suppose that the fault-prediction system provides prediction windows instead of exact predictions, which dramatically complicates the analysis of the checkpointing strategies. We propose a new approach based upon two periodic modes, a regular mode outside prediction windows, and a proactive mode inside prediction windows, whenever the size of these windows is large enough. We are able to compute the best period for any size of the prediction windows, thereby deriving the scheduling strategy that minimizes platform waste. In addition, the results of this analytical evaluation are nicely corroborated by a comprehensive set of simulations, which demonstrate the validity of the model and the accuracy of the approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Radiation Effects in Electronics · Real-Time Systems Scheduling
