An Adaptive Checkpointing Scheme for Peer-to-Peer Based Volunteer Computing Work Flows
Lei Ni, Aaron Harwood

TL;DR
This paper proposes an adaptive, decentralized checkpointing scheme for peer-to-peer volunteer computing systems, reducing runtime by dynamically adjusting checkpoints based on network and peer conditions.
Contribution
It introduces a novel adaptive checkpointing method that uses runtime statistical data to improve robustness and efficiency in large-scale peer-to-peer volunteer computing.
Findings
Reduced overall runtime in simulations
Effective dynamic checkpoint decisions based on network conditions
Enhanced robustness of peer-to-peer volunteer workflows
Abstract
Volunteer Computing, sometimes called Public Resource Computing, is an emerging computational model that is very suitable for work-pooled parallel processing. As more complex grid applications make use of work flows in their design and deployment it is reasonable to consider the impact of work flow deployment over a Volunteer Computing infrastructure. In this case, the inter work flow I/O can lead to a significant increase in I/O demands at the work pool server. A possible solution is the use of a Peer-to- Peer based parallel computing architecture to off-load this I/O demand to the workers; where the workers can fulfill some aspects of work flow coordination and I/O checking, etc. However, achieving robustness in such a large scale system is a challenging hurdle towards the decentralized execution of work flows and general parallel processes. To increase robustness, we propose and show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Peer-to-Peer Network Technologies · Cloud Computing and Resource Management
