Checkpointing vs. Migration for Post-Petascale Machines
Franck Cappello (LRI), Henri Casanova (ICS), Yves Robert (INRIA, Rh\^one-Alpes / LIP Laboratoire de l'Informatique du Parall\'elisme, LIP)

TL;DR
This paper explores the decision-making process between checkpointing and migration techniques for executing jobs on future post-petascale machines, considering various scenarios.
Contribution
It provides a comparative analysis of checkpointing and migration strategies tailored for next-generation high-performance computing environments.
Findings
Migration may outperform checkpointing in certain scenarios.
Checkpointing can be more effective for fault tolerance in some cases.
The choice depends on job type and machine characteristics.
Abstract
We craft a few scenarios for the execution of sequential and parallel jobs on future generation machines. Checkpointing or migration, which technique to choose?
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Cloud Computing and Resource Management · Parallel Computing and Optimization Techniques
