Rapid Recovery of Program Execution Under Power Failures for Embedded Systems with NVM
Min Jia (1), Edwin Hsing.-M. Sha (1), Qingfeng Zhuge (1), Rui Xu (1),, and Shouzhen Gu (2) ((1) East China Normal University School of Computer, Science, Technology, China, (2) East China Normal University Software, Engineering Institute, China)

TL;DR
This paper introduces a checkpointing technique triggered by function calls to rapidly recover embedded systems with NVM after power failures, reducing write overhead and backup size.
Contribution
It proposes a novel function call-triggered checkpointing method with pseudo-function calls and exponential backup strategies to optimize recovery and NVM lifespan.
Findings
99.8% reduction in stack backup size compared to log-based methods
80.5% reduction in backup size compared to step-based methods
Effective recovery under various power failure scenarios
Abstract
After power is switched on, recovering the interrupted program from the initial state can cause negative impact. Some programs are even unrecoverable. To rapid recovery of program execution under power failures, the execution states of checkpoints are backed up by NVM under power failures for embedded systems with NVM. However, frequent checkpoints will shorten the lifetime of the NVM and incur significant write overhead. In this paper, the technique of checkpoint setting triggered by function calls is proposed to reduce the write on NVM. The evaluation results show an average of 99.8% and 80.5$% reduction on NVM backup size for stack backup, compared to the log-based method and step-based method. In order to better achieve this, we also propose pseudo-function calls to increase backup points to reduce recovery costs, and exponential incremental call-based backup methods to reduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed systems and fault tolerance · Parallel Computing and Optimization Techniques
