Checkpoint/Restart for Lagrangian particle mesh with AMR in community code FLASH-X
Rajeev Jain, Klaus Weide, Saurabh Chawdhary, Thomas Klostermann

TL;DR
This paper discusses the implementation of checkpoint-restart capabilities for Lagrangian particle mesh simulations with adaptive mesh refinement in FLASH-X, emphasizing cross-format compatibility and scalable I/O performance on heterogeneous architectures.
Contribution
It introduces design strategies for cross-format checkpointing in FLASH-X and presents scaling studies and new I/O optimization ideas for heterogeneous systems.
Findings
Successful cross-format checkpoint-restart implementation
Demonstrated strong and weak scaling of checkpoint I/O
Proposed new methods for improved I/O performance on heterogeneous architectures
Abstract
In this work we present the design decisions and advantages for accomplishing cross mesh format checkpoint-restart in community code FLASH-X. AMReX and Paramesh are the two AMR mesh formats developed and supported by FLASH-X. We also highlight strong and weak scaling study of existing HDF5 I/O checkpoint writing along with new ideas and results (presented during talk) for utilizing heterogeneous compute architectures for improved I/O performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Scientific Computing and Data Management · Distributed and Parallel Computing Systems
