CRIU -- Checkpoint Restore in Userspace for computational simulations and scientific applications
Fabio Andrijauskas, Igor Sfiligoi, Diego Davila, Aashay Arora,, Jonathan Guiang, Brian Bockelman, Greg Thain, Frank Wurthwein

TL;DR
This paper evaluates CRIU, a Linux tool for checkpointing and restoring processes, for managing long-running scientific computations and applications, highlighting its capabilities and limitations in scientific and containerized environments.
Contribution
It demonstrates the feasibility of using CRIU for checkpointing scientific applications and containerized workloads in a high-throughput computing environment, assessing its practical utility.
Findings
CRIU can checkpoint and restore Linux processes and containers effectively.
It supports open files and network connections during checkpointing.
Some limitations restrict its universal applicability in all scenarios.
Abstract
Creating new materials, discovering new drugs, and simulating systems are essential processes for research and innovation and require substantial computational power. While many applications can be split into many smaller independent tasks, some cannot and may take hours or weeks to run to completion. To better manage those longer-running jobs, it would be desirable to stop them at any arbitrary point in time and later continue their computation on another compute resource; this is usually referred to as checkpointing. While some applications can manage checkpointing programmatically, it would be preferable if the batch scheduling system could do that independently. This paper evaluates the feasibility of using CRIU (Checkpoint Restore in Userspace), an open-source tool for the GNU/Linux environments, emphasizing the OSG's OSPool HTCondor setup. CRIU allows checkpointing the process…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiation Effects in Electronics · Advancements in Semiconductor Devices and Circuit Design
