An Adaptive Checkpointing Scheme for Peer-to-Peer Based Volunteer   Computing Work Flows

Lei Ni; Aaron Harwood

arXiv:0711.3949·cs.DC·November 27, 2007

An Adaptive Checkpointing Scheme for Peer-to-Peer Based Volunteer Computing Work Flows

Lei Ni, Aaron Harwood

PDF

Open Access

TL;DR

This paper proposes an adaptive, decentralized checkpointing scheme for peer-to-peer volunteer computing systems, reducing runtime by dynamically adjusting checkpoints based on network and peer conditions.

Contribution

It introduces a novel adaptive checkpointing method that uses runtime statistical data to improve robustness and efficiency in large-scale peer-to-peer volunteer computing.

Findings

01

Reduced overall runtime in simulations

02

Effective dynamic checkpoint decisions based on network conditions

03

Enhanced robustness of peer-to-peer volunteer workflows

Abstract

Volunteer Computing, sometimes called Public Resource Computing, is an emerging computational model that is very suitable for work-pooled parallel processing. As more complex grid applications make use of work flows in their design and deployment it is reasonable to consider the impact of work flow deployment over a Volunteer Computing infrastructure. In this case, the inter work flow I/O can lead to a significant increase in I/O demands at the work pool server. A possible solution is the use of a Peer-to- Peer based parallel computing architecture to off-load this I/O demand to the workers; where the workers can fulfill some aspects of work flow coordination and I/O checking, etc. However, achieving robustness in such a large scale system is a challenging hurdle towards the decentralized execution of work flows and general parallel processes. To increase robustness, we propose and show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Peer-to-Peer Network Technologies · Cloud Computing and Resource Management