# Fault Tolerant Gradient Clock Synchronization

**Authors:** Johannes Bund, Christoph Lenzen, Will Rosenbaum

arXiv: 1902.08042 · 2019-02-22

## TL;DR

This paper presents a method to achieve fault-tolerant gradient clock synchronization in arbitrary network topologies by combining existing algorithms and augmenting the network with fully connected clusters, ensuring optimal local skew despite Byzantine faults.

## Contribution

It introduces a novel approach that combines Lynch-Welch and Lenzen et al.'s algorithms, augmented with network replication, to achieve fault-tolerant gradient clock synchronization on general graphs.

## Key findings

- Achieves asymptotically optimal local skew under Byzantine faults.
- Provides a fault-tolerant synchronization method with $O(f)$ and $O(f^2)$ overheads.
- Ensures robustness in arbitrary network topologies with high fault tolerance.

## Abstract

Synchronizing clocks in distributed systems is well-understood, both in terms of fault-tolerance in fully connected systems and the dependence of local and global worst-case skews (i.e., maximum clock difference between neighbors and arbitrary pairs of nodes, respectively) on the diameter of fault-free systems. However, so far nothing non-trivial is known about the local skew that can be achieved in topologies that are not fully connected even under a single Byzantine fault. Put simply, in this work we show that the most powerful known techniques for fault-tolerant and gradient clock synchronization are compatible, in the sense that the best of both worlds can be achieved simultaneously.   Concretely, we combine the Lynch-Welch algorithm [Welch1988] for synchronizing a clique of $n$ nodes despite up to $f<n/3$ Byzantine faults with the gradient clock synchronization (GCS) algorithm by Lenzen et al. [Lenzen2010] in order to render the latter resilient to faults. As this is not possible on general graphs, we augment an input graph $\mathcal{G}$ by replacing each node by $3f+1$ fully connected copies, which execute an instance of the Lynch-Welch algorithm. We then interpret these clusters as supernodes executing the GCS algorithm, where for each cluster its correct nodes' Lynch-Welch clocks provide estimates of the logical clock of the supernode in the GCS algorithm. By connecting clusters corresponding to neighbors in $\mathcal{G}$ in a fully bipartite manner, supernodes can inform each other about (estimates of) their logical clock values. This way, we achieve asymptotically optimal local skew, granted that no cluster contains more than $f$ faulty nodes, at factor $O(f)$ and $O(f^2)$ overheads in terms of nodes and edges, respectively. Note that tolerating $f$ faulty neighbors trivially requires degree larger than $f$, so this is asymptotically optimal as well.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.08042/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1902.08042/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1902.08042/full.md

---
Source: https://tomesphere.com/paper/1902.08042