Performing work efficiently in the presence of faults
Cynthia Dwork, Joseph Y. Halpern, and O. Waarts

TL;DR
This paper introduces three work-optimal protocols for distributed systems to perform tasks reliably despite process failures, optimizing message complexity and execution time under different failure scenarios.
Contribution
It presents three novel protocols that are work-optimal for fault-tolerant distributed work, with different trade-offs in message complexity and time efficiency.
Findings
All protocols are work-optimal with O(n+t) work.
One protocol achieves moderate message and time costs, adaptable to asynchronous systems.
Another minimizes messages to O(t log t) but has exponential time complexity.
Abstract
We consider a system of t synchronous processes that communicate only by sending messages to one another, and that together must perform independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process survives, all n units of work will be performed. We consider three parameters: the number of messages sent, the total number of units of work performed (including multiplicities), and time. We present three protocols for solving the problem. All three are work-optimal, doing O(n+t) work. The first has moderate costs in the remaining two parameters, sending O(t\sqrt{t}) messages, and taking O(n+t) time. This protocol can be easily modified to run in any completely asynchronous system equipped with a failure detection mechanism. The second sends only O(t log{t}) messages, but its running time is large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Interconnection Networks and Systems · Optimization and Search Problems
