Fully distributed and fault tolerant task management based on diffusions
Alain Bui, Olivier Flauzac, Cyril Rabat

TL;DR
This paper introduces three improved, fully distributed, fault-tolerant task management methods for computational grids, utilizing circulating words and diffusions to reduce task replication and enhance efficiency.
Contribution
It proposes three novel diffusion-based methods that improve the active task management approach, reducing task replication and increasing efficiency in distributed grid environments.
Findings
Methods produce fewer replicated tasks.
They are fully distributed and fault tolerant.
Efficiency is improved over previous approaches.
Abstract
The task management is a critical component for the computational grids. The aim is to assign tasks on nodes according to a global scheduling policy and a view of local resources of nodes. A peer-to-peer approach for the task management involves a better scalability for the grid and a higher fault tolerance. But some mechanisms have to be proposed to avoid the computation of replicated tasks that can reduce the efficiency and increase the load of nodes. In the same way, these mechanisms have to limit the number of exchanged messages to avoid the overload of the network. In a previous paper, we have proposed two methods for the task management called active and passive. These methods are based on a random walk: they are fully distributed and fault tolerant. Each node owns a local tasks states set updated thanks to a random walk and each node is in charge of the local assignment. Here,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Peer-to-Peer Network Technologies
