Checkpointing with Minimal Recover in Adhocnet based TMR
Sarmistha Neogy

TL;DR
This paper introduces a distributed checkpointing and recovery protocol for Wireless Adhoc Networks using Triple Modular Redundancy, focusing on minimal recovery overhead and dependency-based checkpoint selection.
Contribution
It proposes a novel protocol that eliminates unnecessary checkpoints and ensures dependency-aware recovery in TMR-enabled AdhocNet environments.
Findings
Reduces unnecessary checkpoints in TMR systems.
Ensures dependency-based recovery to improve fault tolerance.
Prevents missing or orphan messages during recovery.
Abstract
This paper describes two-fold approach towards utilizing Triple Modular Redundancy (TMR) in Wireless Adhoc Network (AdocNet). A distributed checkpointing and recovery protocol is proposed. The protocol eliminates useless checkpoints and helps in selecting only dependent processes in the concerned checkpointing interval, to recover. A process starts recovery from its last checkpoint only if it finds that it is dependent (directly or indirectly) on the faulty process. The recovery protocol also prevents the occurrence of missing or orphan messages. In AdocNet, a set of three nodes (connected to each other) is considered to form a TMR set, being designated as main, primary and secondary. A main node in one set may serve as primary or secondary in another. Computation is not triplicated, but checkpoint by main is duplicated in its primary so that primary can continue if main fails.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
