Topological Characterization of Consensus in Distributed Systems

Thomas Nowak; Ulrich Schmid; Kyrill Winkler

arXiv:1905.09590·cs.DC·August 22, 2024

Topological Characterization of Consensus in Distributed Systems

Thomas Nowak, Ulrich Schmid, Kyrill Winkler

PDF

Open Access

TL;DR

This paper characterizes the solvability of consensus in distributed systems with faults using novel topological methods, extending classical topology to analyze execution spaces and explain algorithmic possibilities.

Contribution

It introduces fault-aware topologies on execution spaces, providing a unified topological framework for understanding consensus solvability and existing impossibility results.

Findings

01

Consensus solvability corresponds to disconnected sets in the new topologies.

02

The approach explains existing algorithms and impossibility results topologically.

03

Develops a new equivalence between strong and weak validity conditions.

Abstract

We provide a complete characterization of both uniform and non-uniform deterministic consensus solvability in distributed systems with benign process and communication faults using point-set topology. More specifically, we non-trivially extend the approach introduced by Alpern and Schneider in 1985, by introducing novel fault-aware topologies on the space of infinite executions: the process-view topology, induced by a distance function that relies on the local view of a given process in an execution, and the minimum topology, which is induced by a distance function that focuses on the local view of the process that is the last to distinguish two executions. Consensus is solvable in a given model if and only if the sets of admissible executions leading to different decision values is disconnected in these topologies. By applying our approach to a wide range of different applications, we…

Equations36

B_{ε} (x) \subseteq U \subseteq ⋃ U,

B_{ε} (x) \subseteq U \subseteq ⋃ U,

B_{ε} (x) \subseteq B_{ε_{ℓ}} (x) \subseteq U_{ℓ}

B_{ε} (x) \subseteq B_{ε_{ℓ}} (x) \subseteq U_{ℓ}

X^{ω} \times X^{ω} \to R, (γ, δ) \mapsto 2^{- i n f {t \geq 0 ∣ d (C^{t}, D^{t}) > 0}}

X^{ω} \times X^{ω} \to R, (γ, δ) \mapsto 2^{- i n f {t \geq 0 ∣ d (C^{t}, D^{t}) > 0}}

\begin{split}B_{\varepsilon}(\gamma)&=\big{\{}\delta=(D^{t})_{t\geq 0}\in X^{\omega}\mid\forall 0\leq s\leq t\colon d(C^{s},D^{s})=0\big{\}}\\ &\subseteq\big{\{}\delta=(D^{t})_{t\geq 0}\in X^{\omega}\mid d(C^{t},D^{t})=0\big{\}}\\ &=(\pi^{t})^{-1}\big{[}\{D\in X\mid d(C^{t},D)=0\}\big{]}\subseteq(\pi^{t})^{-1}[U],\end{split}

\begin{split}B_{\varepsilon}(\gamma)&=\big{\{}\delta=(D^{t})_{t\geq 0}\in X^{\omega}\mid\forall 0\leq s\leq t\colon d(C^{s},D^{s})=0\big{\}}\\ &\subseteq\big{\{}\delta=(D^{t})_{t\geq 0}\in X^{\omega}\mid d(C^{t},D^{t})=0\big{\}}\\ &=(\pi^{t})^{-1}\big{[}\{D\in X\mid d(C^{t},D)=0\}\big{]}\subseteq(\pi^{t})^{-1}[U],\end{split}

\begin{split}F&=\bigl{(}\prod_{s=0}^{t}B_{1}(C^{s})\bigr{)}\times X^{\omega}=\bigcap_{s=0}^{t}(\pi^{s})^{-1}\big{[}B_{1}(C^{s})\big{]}\\ &\subseteq\big{\{}\delta=(D^{t})_{t\geq 0}\in X^{\omega}\mid\forall 0\leq s\leq t\colon d(C^{s},D^{s})=0\big{\}}=B_{\varepsilon}(\gamma)\enspace.\end{split}

\begin{split}F&=\bigl{(}\prod_{s=0}^{t}B_{1}(C^{s})\bigr{)}\times X^{\omega}=\bigcap_{s=0}^{t}(\pi^{s})^{-1}\big{[}B_{1}(C^{s})\big{]}\\ &\subseteq\big{\{}\delta=(D^{t})_{t\geq 0}\in X^{\omega}\mid\forall 0\leq s\leq t\colon d(C^{s},D^{s})=0\big{\}}=B_{\varepsilon}(\gamma)\enspace.\end{split}

d_{m a x} : Σ \times Σ \to R_{+}, d_{m a x} (γ, δ) = 2^{- i n f {t \geq 0 ∣ C^{t} \neq = D^{t}}},

d_{m a x} : Σ \times Σ \to R_{+}, d_{m a x} (γ, δ) = 2^{- i n f {t \geq 0 ∣ C^{t} \neq = D^{t}}},

d_{p} (C, D) = {01 if C \sim_{p} D and p \in O b (C) \cap O b (D) otherwise .

d_{p} (C, D) = {01 if C \sim_{p} D and p \in O b (C) \cap O b (D) otherwise .

d_{p} : Σ \times Σ \to R_{+}, d_{p} (γ, δ) = 2^{- i n f {t \geq 0 ∣ d_{p} (C^{t}, D^{t}) > 0}}

d_{p} : Σ \times Σ \to R_{+}, d_{p} (γ, δ) = 2^{- i n f {t \geq 0 ∣ d_{p} (C^{t}, D^{t}) > 0}}

d_{p} (α, β) = d_{p} (β, α)

d_{p} (α, β) = d_{p} (β, α)

d_{p} (α, γ) \leq d_{p} (α, β) + d_{p} (β, γ)

d_{u} (γ, δ) = p \in Π min d_{p} (γ, δ) .

d_{u} (γ, δ) = p \in Π min d_{p} (γ, δ) .

\begin{split}\mathcal{N}&=\big{\{}\delta\in\Sigma\mid d_{\mathrm{u}}(\gamma,\delta)<2^{-T}\big{\}}\\ &=\big{\{}\delta\in\Sigma\mid\exists p\in\Pi\colon d_{p}(C^{t},D^{t})<2^{-T}\big{\}}\\ &=\big{\{}\delta\in\Sigma\mid\exists p\in\Pi\ \forall t\leq T\colon C^{t}\sim_{p}D^{t}\wedge p\in Ob(C^{t})\cap Ob(D^{t})\big{\}}\\ &=\big{\{}\delta\in\Sigma\mid\exists p\in\Pi\colon C^{T}\sim_{p}D^{T}\wedge p\in Ob(C^{T})\cap Ob(D^{T})\big{\}}\end{split}

\begin{split}\mathcal{N}&=\big{\{}\delta\in\Sigma\mid d_{\mathrm{u}}(\gamma,\delta)<2^{-T}\big{\}}\\ &=\big{\{}\delta\in\Sigma\mid\exists p\in\Pi\colon d_{p}(C^{t},D^{t})<2^{-T}\big{\}}\\ &=\big{\{}\delta\in\Sigma\mid\exists p\in\Pi\ \forall t\leq T\colon C^{t}\sim_{p}D^{t}\wedge p\in Ob(C^{t})\cap Ob(D^{t})\big{\}}\\ &=\big{\{}\delta\in\Sigma\mid\exists p\in\Pi\colon C^{T}\sim_{p}D^{T}\wedge p\in Ob(C^{T})\cap Ob(D^{T})\big{\}}\end{split}

d_{nu} (γ, δ) = {min_{p \in O b (γ) \cap O b (δ)} d_{p} (γ, δ) 1 if O b (γ) \cap O b (δ) \neq = \emptyset if O b (γ) \cap O b (δ) = \emptyset .

d_{nu} (γ, δ) = {min_{p \in O b (γ) \cap O b (δ)} d_{p} (γ, δ) 1 if O b (γ) \cap O b (δ) \neq = \emptyset if O b (γ) \cap O b (δ) = \emptyset .

\begin{split}\mathcal{N}&=\big{\{}\delta\in\Sigma\mid d_{\mathrm{nu}}(\gamma,\delta)<2^{-T}\big{\}}\\ &=\big{\{}\delta\in\Sigma\mid\exists p\in Ob(\gamma)\cap Ob(\delta)\colon d_{p}(\gamma,\delta)<2^{-T}\big{\}}\\ &=\big{\{}\delta\in\Sigma\mid\exists p\in Ob(\gamma)\cap Ob(\delta)\colon\forall t\leq T\colon C^{t}\sim_{p}D^{t}\big{\}}\end{split}

\begin{split}\mathcal{N}&=\big{\{}\delta\in\Sigma\mid d_{\mathrm{nu}}(\gamma,\delta)<2^{-T}\big{\}}\\ &=\big{\{}\delta\in\Sigma\mid\exists p\in Ob(\gamma)\cap Ob(\delta)\colon d_{p}(\gamma,\delta)<2^{-T}\big{\}}\\ &=\big{\{}\delta\in\Sigma\mid\exists p\in Ob(\gamma)\cap Ob(\delta)\colon\forall t\leq T\colon C^{t}\sim_{p}D^{t}\big{\}}\end{split}

Δ_{p} (C) = {v ⊥ if {δ \in Σ ∣ \exists t : C \sim_{p} D^{t}} \subseteq Σ_{v}, otherwise,

Δ_{p} (C) = {v ⊥ if {δ \in Σ ∣ \exists t : C \sim_{p} D^{t}} \subseteq Σ_{v}, otherwise,

Δ_{p} (C) = {v ⊥ if {δ \in Σ ∣ \exists t : C \sim_{p} D^{t} \land p \in O b (δ)} \subseteq Σ_{v}, otherwise,,

Δ_{p} (C) = {v ⊥ if {δ \in Σ ∣ \exists t : C \sim_{p} D^{t} \land p \in O b (δ)} \subseteq Σ_{v}, otherwise,,

(oldenough (σ, r) = true \land oldenough (ρ, r) = true) \Rightarrow p_{σ} = p_{ρ} .

(oldenough (σ, r) = true \land oldenough (ρ, r) = true) \Rightarrow p_{σ} = p_{ρ} .

τ ∣_{r_{0}^{'}} \sim_{C (τ ∣_{r^{'}})} σ ∣_{r_{0}^{'}} \sim_{C (τ ∣_{r^{'}})} ρ ∣_{r_{0}^{'}}

τ ∣_{r_{0}^{'}} \sim_{C (τ ∣_{r^{'}})} σ ∣_{r_{0}^{'}} \sim_{C (τ ∣_{r^{'}})} ρ ∣_{r_{0}^{'}}

σ ∣_{r_{0}} \sim_{C (ω_{r})} ρ ∣_{r_{0}} \sim_{C (ω_{r})} ω_{r_{0}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed systems and fault tolerance · Distributed Control Multi-Agent Systems · Cooperative Communication and Network Coding

Full text

Topological Characterization of Consensus in Distributed Systems

Dedicated to the 2018 Dijkstra Prize winners Bowen Alpern and Fred B. Schneider

Thomas Nowak

Université Paris-Saclay, CNRSOrsayFrance

[email protected]

,

Ulrich Schmid

TU WienViennaAustria

[email protected]

and

Kyrill Winkler

ITK EngineeringViennaAustria

[email protected]

Abstract.

We provide a complete characterization of both uniform and non-uniform deterministic consensus solvability in distributed systems with benign process and communication faults using point-set topology. More specifically, we non-trivially extend the approach introduced by Alpern and Schneider in 1985, by introducing novel fault-aware pseudo-(semi-)metric topologies on the space of infinite executions: the process-view topology, induced by a distance function that relies on the local view of a given process in an execution, and the minimum topology, which is induced by a distance function that focuses on the local view of the process that is the last to distinguish two executions. Consensus is solvable in a given model if and only if the sets of admissible executions leading to different decision values is disconnected in these topologies. We also provide two alternative characterizations, based on the broadcastability of connected components and on the exclusion of certain “fair” and “unfair” limit sequences (which coincide with forever bivalent runs). By applying our approach to a wide range of different applications, we provide a topological explanation of a number of existing algorithms and impossibility results and develop several new ones.

Topological characterization; point-set topology; consensus; distributed systems; benign faults

††copyright: acmlicensed††doi: 0000001.0000001††ccs: Theory of computation Distributed algorithms

1. Introduction

We provide a complete characterization111This paper is an substantial generalization and extension of our PODC’19 paper (NSW19:PODC), and also covers its offsprings (WSM19:OPODIS) and (WSN21:FCT) w.r.t. applications. of the solvability of deterministic non-uniform and uniform consensus in any distributed system with benign process and/or communication failures, using point-set topology as introduced in the Dijkstra Prize-winning paper by Alpern and Schneider (AS84). Our results hence precisely delimit the consensus solvability/impossibility border in very different distributed systems such as dynamic networks (KO11:SIGACT) controlled by a message adversary (AG13), synchronous distributed systems with processes that may crash or commit send and/or receive omission failures (PT86), or purely asynchronous systems with crash failures (FLP85), for example. Whereas we will primarily focus on message-passing architectures in our examples, our topological approach also covers shared-memory systems.

Deterministic consensus, where every process starts with some input value and has to irrevocably compute a common output value is arguably the most well-studied problem in distributed computing. Both impossibility results and consensus algorithm are known for virtually all distributed computing that have been proposed so far. However, they have been obtained primarily on a case-by-case basis, using classic combinatorial analysis techniques (FR03). Whereas there are also some reasonably model-independent characterizations (MR02; LM95:DC), we are not aware of any approach that allows to precisely characterize the consensus solvability/impossibility border for arbitrary distributed systems with benign process and communication failures in general.

In this paper, we provide such a characterization based on point-set topology as introduced by Alpern and Schneider (AS84). Regarding topological methods in distributed computing, one has to distinguish point-set topology, which considers the space of infinite executions of a distributed algorithm, from combinatorial topology, which studies the topology of reachable states of prefixes of admissible executions using simplicial complexes. Fig. 1 illustrates the objects studied in combinatorial topology vs. point-set topology. As of today, combinatorial topology has been developed into a quite widely applicable tool for the analysis of distributed systems (HKR13). A celebrated result in this area is the Asynchronous Computability Theorem (HS99:ACT; GRS22:ITCS), for example, which characterizes solvable tasks in wait-free asynchronous shared memory systems with crashes.

By contrast, point-set topology has only rarely been used in distributed computing. The primary objects are the infinite executions of a distributed algorithm (AS84). By defining a suitable metric between two infinite executions $\alpha$ and $\beta$ , each considered as the corresponding infinite sequence of global states of the algorithm in the respective execution, they can be viewed as elements of a topological space. For example, according to the common-prefix metric $d_{\max}(\alpha,\beta)$ , the executions $\alpha$ and $\beta$ are close if the common prefix where no process can distinguish them is long. A celebrated general result (AS84) is that closed and dense sets in the resulting space precisely characterize safety and liveness properties, respectively.

Prior to our paper (NSW19:PODC), however, point-set topology has only occasionally been used for establishing impossibility results. We are only aware of some early work by one of the authors of this paper on a generic topological impossibility proof for consensus in compact models (Now10:master), and a topological study of the strongly dependent decision problem (BR19:ICDCN). Lubitch and Moran (LM95:DC) introduced a construction for schedulers, which leads to limit-closed submodels222Informally, a model is limit-closed if the limit of a sequence of growing prefixes of admissible executions is admissible. Note that the wait-free asynchronous model is limit-closed. of classic non-closed distributed computing models (like asynchronous systems consisting of $|\Pi|=n$ processes, up to which $t<n-1$ may crash). In a similar spirit, Kuznetsov, Rieutard and He showed (KRH18:PODC), in the setting of combinatorial topology, how to reason about non-closed models by considering equivalent affine tasks that are closed. A similar purpose is served by defining layerings, as introduced by Moses and Rajsbaum (MR02). Whereas such constructions of closed submodels greatly simplify impossibility proofs, they do not lead to a precise characterization of consensus solvability in non-closed models, however.

Contributions. Building on our PODC’19 paper (NSW19:PODC) devoted to consensus in dynamic networks under message adversaries (AG13), the present paper provides a complete topological characterization of both the non-uniform and uniform deterministic consensus solvability/impossibility border for general distributed systems with benign333Actually, our framework immediately generalizes to Byzantine faults as well. Since consensus with Byzantine faults needs a different validity condition, though, we restrict our attention to benign faults for consistency of the presentation. process and/or communication faults. To achieve this, we had to add several new topological ideas to the setting of Alpern and Schneider (AS84), as detailed below, which not only allowed us to deal with both closed and non-closed models, but also provided us with a topological explanation of bivalence (FLP85) and bipotence (MR02) impossibility proofs. In more detail:

(i) We introduce a simple generic system model for full-information protocols that covers all distributed system models with benign faults we are aware of. We define new topologies on the execution space of general distributed algorithms in this model, which allow us to reason about sequences of local views of (correct) processes, rather than about global configuration sequences. The $p$ -view topology is defined by a pseudo-metric $d_{p}(\alpha,\beta)$ based on the common prefix of $p$ ’s local views in the executions $\alpha$ and $\beta$ . The uniform and non-uniform minimum topology are induced by the last (correct) process to notice a difference between two executions, which leads to pseudo-semi-metrics.

(ii) We show that consensus can be modeled as a continuous decision function $\Delta$ in our topologies, which maps an admissible execution to its unique decision value. This allows us to prove that consensus is solvable if and only if all the decision sets, i.e., the pre-images $\Gamma_{v}=\Delta^{-1}[\{v\}]$ resp. $PS_{v}=\tau^{-1}\bigl{[}\Delta^{-1}[\{v\}]\bigr{]}$ for every decision value $v$ , are disconnected from each other. We also provide a universal uniform and non-uniform consensus algorithm, which rely on this separation.

(iii) We introduce process-time graphs (BM14:JACM) as a succinct alternative to configuration sequences in executions, and show that they are equivalent w.r.t. our topological reasoning. This is accomplished by implementing our generic system model in an “operational” system model, based on the widely applicable system model by Moses and Rajsbaum (MR02).

(iv) We provide an alternative characterization of uniform and non-uniform consensus solvability based on the broadcastability of the connected components of the decision sets in the process-time graph topologies. Moreover, utilizing some properties of the pseudo-metric $d_{p}$ , we provide a characterization of consensus solvability based on the limits of two infinite sequences of admissible process-time graphs, taken from different decision sets. Consensus is impossible if there is just one pair of such limits with distance 0, which actually coincide with the forever bivalent/bipotent executions constructed in previous proofs (FLP85; MR02).

(v) We demonstrate the utility of our approach by applying our topological findings to several different distributed computing models. Apart from providing a topological explanation of well-known classic results like bivalence proofs, we give the first comprehensive characterization of consensus solvability for both compact and non-compact message adversaries. Moreover, our results also lead to new consensus algorithms for some models.

Paper organization. In Section 3, we define the elements of the spaces that will be endowed with our new topologies in Section 4. Section 5 introduces the consensus problem in topological terms and provides our abstract characterization result for uniform consensus (Theorem 5.1) and non-uniform consensus (Theorem 5.2), which also provide universal algorithms. Process-time graphs and our operational system model are introduced in Section 6. Alternative characterizations based on broadcastability and limit exclusion are provided in Section 7 and Section 8, respectively. Section 9 is devoted to applications. Some conclusions in Section 10 round off our paper.

2. Related Work

Besides the few point-set topology papers (AS84; Now10:master; BR19:ICDCN) and the closed model constructions (LM95:DC; MR02; KRH18:PODC) already mentioned in Section 1, there is an abundant literature on consensus algorithms and impossibility proofs.

Regarding combinatorial topology, it is worth mentioning that our study of the indistinguishability relation of prefixes of runs is closely connected to connectivity properties of the $r$ -round protocol complex. However, in non-limit-closed models, we need to go beyond a uniformly bounded prefix length. This is in sharp contrast to the models considered in combinatorial topology (CFPR19:SSS; ACR20:OPODIS), which are all limit-closed (typically, wait-free asynchronous).

A celebrated paper on the impossibility of consensus in asynchronous systems with crash failures is by Fischer, Lynch, and Paterson (FLP85), who also introduced the bivalence proof technique. Unreliable failure detectors for circumventing this impossibility exist (CT96). Consensus in synchronous systems with Byzantine-faulty processes has been introduced by Lamport, Shostak, and Pease (LSP82). The seminal works by Dolev, Dwork, and Stockmeyer (DDS87) and Dwork, Lynch, and Stochmeyer (DLS88) on partially synchronous systems introduced important abstractions like eventual stabilization and eventually bounded message delays, and provided a characterization of consensus solvability under various combinations of synchrony and failure models. Consensus in systems with weak timely links and crash failures was considered (ADGFT04; HMSZ08:TDSC). Algorithms for consensus in systems with general omission process failures were provided by Perry and Toueg (PT86).

Perhaps one of the earliest characterizations of consensus solvability in synchronous distributed systems prone to communication errors is the seminal work by Santoro and Widmayer (SW89), where it was shown that consensus is impossible if up to $n-1$ messages may be lost in each round. This classic result was refined in (SWK09; CBS09) and, more recently, by Coulouma, Godard, and Peters (CGP15), where a property of an equivalence relation on the sets of communication graphs was found that captures exactly the source of consensus impossibility. The authors also showed how this property can be exploited in order to develop a generic consensus algorithm.

Following Afek and Gafni (AG13), such distributed systems are nowadays known as dynamic networks, where the per-round directed communication graphs are controlled by a message adversary. Whereas Coulouma, Godard, Peters (CGP15) studied oblivious message adversaries, where the communication graphs are picked arbitrarily from a set of candidate graphs, more recent papers (BRSSW18:TCS; WSS19:DC) studied eventually stabilizing message adversaries, which guarantee that some rounds with “good” communication graphs will eventually be generated. Note that oblivious message adversaries are limit-closed, which is not the case for general message adversaries like the eventually stabilizing ones. Raynal and Stainer explored the relation between message adversaries and failure detectors (RS13:PODC).

The first characterization of consensus solvability under general message adversaries was provided by Fevat and Godard (FG11), albeit only for systems that consist of two processes. A bivalence argument was used there to show that certain communication patterns, namely, a fair or a special pair of unfair communication patterns, must be excluded by the MA for consensus to become solvable. However, a complete characterization of consensus solvability for arbitrary system sizes did not exist until now.

3. Generic System Model

We consider distributed message passing or shared memory systems made up of a set of $n$ deterministic processes $\Pi$ with unique identifiers, taken from $[n]=\{1,\dots,n\}$ for simplicity.

We denote individual processes by letters $p$ , $q$ , etc.

For our characterization of consensus solvability, we restrict our attention to full-information executions, in which processes continuously relay all the information they gathered to all other processes, and eventually apply some local decision function. The exchanged information includes the process’s initial value, but also, more importantly, a record of all events (message receptions, shared memory readings, object invocations, …) witnessed by the process. As such, our general system model is hence applicable whenever no constraints are placed on the size of the local memory and the size of values to be communicated (e.g., message/shared-register size). In particular, it is applicable to classical synchronous and asynchronous message-passing and shared-memory models, with benign (and even Byzantine) process and communication faults. In Section 6, we will provide a more “operational” system model, which is obtained by instantiating our generic system model in the model of Moses and Rajsbaum (MR02).

Formally, a (full-information)

execution is a sequence of (full-information) configurations. For every process $p\in\Pi$ , there is an equivalence relation $\sim_{p}$ on the set $\mathcal{C}$ of configurations—the $p$ -indistinguishability relation—indicating whether process $p$ can locally distinguish two configurations, i.e., if it has the same view $V_{p}(C)=V_{p}(D)$ in $C$ and $D$ . In this case we write $C\sim_{p}D$ . Note that two configurations that are indistinguishable for all processes need not be equal. In fact, configurations usually include some state of the communication media that is not accessible to any process.

In addition to the indistinguishability relations, we assume the existence of a function $Ob:\mathcal{C}\to 2^{\Pi}$ that specifies the set of obedient processes in a given configuration. Obedient processes must follow the algorithm and satisfy the (consensus) specification; usually, $Ob(C)$ is the set of non-faulty processes. Again, this information is usually not accessible to the processes. We make the restriction that disobedient processes cannot recover and become obedient again, i.e., that $Ob(C)\supseteq Ob(C^{\prime})$ if $C^{\prime}$ is reachable from $C$ . We extend the obedience function to the set $\Sigma\subseteq\mathcal{C}^{\omega}$ of admissible executions in a given model by setting $Ob:\Sigma\to 2^{\Pi}$ , $Ob(\gamma)=\bigcap_{t\geq 0}Ob(C^{t})$ where $\gamma=(C^{t})_{t\geq 0}$ . Here, $t\in\mathbb{N}_{0}=\mathbb{N}\cup\{0\}$ denotes a notion of global time that is not accessible to the processes. Consequently, a process is obedient in an

execution if it is obedient in all of its configurations. We further make the restriction that there is at least one obedient process in every execution, i.e., that $Ob(\gamma)\neq\emptyset$ for all $\gamma\in\Sigma$ .

We also assume that every process has the possibility to weakly count the steps it has taken. Formally, we assume the existence of weak clock functions $\chi_{p}:\mathcal{C}\to\mathbb{N}_{0}$ such that for every execution $\delta=(D^{t})_{t\geq 0}\in\Sigma$ and every configuration $C\in\mathcal{C}$ , the relation $C\sim_{p}D^{t}$ implies $t\geq\chi_{p}(C)$ . Additionally, we assume that $\chi_{p}(D^{t})\to\infty$ as $t\to\infty$ for every execution $\delta\in\Sigma$ and every obedient process $p\in Ob(\delta)$ . This definition allows for non-lockstep, even asynchronous, executions.

For the discussion of decision problems, we need to introduce the notion of input values. Since we limit ourselves to the consensus problem, we need not distinguish between the sets of input values and output values. We thus just assume the existence of a set $\mathcal{V}$ of potential input values, and require that the potential output values are also in $\mathcal{V}$ .

Furthermore, each (initial) configuration is assumed to contain an initial value in $\mathcal{V}$ for each process. This information is locally accessible to the processes, i.e., each process can access its own initial value (and those it has heard from).

A decision algorithm is a collection of functions $\Delta_{p}:\mathcal{C}\to\mathcal{V}\cup\{\perp\}$ such that $\Delta_{p}(C)=\Delta_{p}(D)$ if $C\sim_{p}D$ and $\Delta_{p}(C^{\prime})=\Delta_{p}(C)$ if $C^{\prime}$ is reachable from $C$ and $\Delta_{p}(C)\neq\perp$ , where $\perp\not\in\mathcal{V}$ represents the fact that $p$ has not decided yet. That is, decisions depend on local information only and are irrevocable. Every process $p$ thus has at most one decision value in an execution. We can extend the decision function to executions by setting $\Delta_{p}:\Sigma\to\mathcal{V}\cup\{\perp\}$ , $\Delta_{p}(\gamma)=\lim_{t\to\infty}\Delta_{p}(C^{t})$ where $\gamma=(C^{t})_{t\geq 0}$ . We say that $p$ has decided value $v\neq\perp$ in configuration $C$ or execution $\gamma$ if $\Delta_{p}(C)=v$ or $\Delta_{p}(\gamma)=v$ , respectively.

We will consider both non-uniform and uniform consensus with weak444We note that our results can be easily adapted to different validity conditions. validity, defined as follows:

Definition 3.1 (Non-uniform and uniform consensus).

A non-uniform consensus algorithm $\mathcal{A}$ is a decision algorithm that ensures the following properties in all of its admissible executions:

(T)

Eventually, every obedient process must irrevocably decide. (Termination) 2. (A)

If two obedient processes have decided, then their decision values are equal. (Agreement) 3. (V)

If the initial values of processes are all equal to $v$ , then $v$ is the only possible decision value. (Validity)

A uniform consensus algorithm $\mathcal{A}$ must ensure (T), (V), and

(UA)

If two processes have decided, then their decision values are equal. (Uniform Agreement)

By Termination, Agreement, and the fact that every execution has at least one obedient process, for every consensus algorithm, we can define the consensus decision function $\Delta:\Sigma\to\mathcal{V}$ by setting $\Delta(\gamma)=\Delta_{p}(\gamma)$ where $p$ is any process that is obedient in execution $\gamma$ , i.e., $p\in Ob(\gamma)$ .

To illustrate the difference between uniform and non-uniform consensus, as well as to motivate the two topologies serving to characterize their solvability, consider the example of two synchronous non-communicating processes. The set of processes is $\Pi=\{1,2\}$ and the set of possible values is $\mathcal{V}=\{0,1\}$ . Processes proceed in lock-step synchronous rounds, but cannot communicate. Thus, the only information a process has access to is its own initial value and the current time. The set of executions $\Sigma$ and the obedience function $Ob$ are defined such that one of the processes eventually becomes disobedient in every execution, but not both processes. In this model, it is trivial to solve non-uniform consensus by immediately deciding on one’s own initial value, but uniform consensus is impossible.

4. Topological Structure of Full-Information Executions

In this section, we will endow the various sets introduced in Section 3

with suitable topologies. We first recall briefly the basic topological notions that are needed for our exposition. For a more thorough introduction, however, the reader is advised to refer to a textbook (Munkres).

A topology on a set $X$ is a family $\mathcal{T}$ of subsets of $X$ such that $\emptyset\in\mathcal{T}$ , $X\in\mathcal{T}$ , and $\mathcal{T}$ contains all arbitrary unions as well as all finite intersections of its members. We call $X$ endowed with $\mathcal{T}$ , often written as $(X,\mathcal{T})$ , a topological space and the members of $\mathcal{T}$ open sets. The complement of an open set is called closed and sets that are both open and closed, such as $\emptyset$ and $X$ itself, are called clopen. A topological space is disconnected, if it contains a nontrivial clopen set, which means that it it can be partitioned into two disjoint open sets. It is connected if it is not disconnected.

A function from space $X$ to space $Y$ is continuous if the pre-image of every open set in $Y$ is open in $X$ . Given a space $(X,\mathcal{T})$ , $Y\subseteq X$ is called a subspace of $X$ if $Y$ is equipped with the subspace topology $\{Y\cap U\mid U\in\mathcal{T}\}$ . Given $A\subseteq X$ , the closure of $A$ is the intersection of all closed sets containing $A$ . For a space $X$ , if $A\subseteq X$ , we call $x$ a limit point of $A$ if it belongs to the closure of $A\setminus\{x\}$ . It can be shown that the closure of $A$ is the union of $A$ with all limit points of $A$ . Space $X$ is called compact if every family of open sets that covers $X$ contains a finite sub-family that covers $X$ .

If $X$ is a nonempty set, then we call any function $d:X\times X\to\mathbb{R}_{+}$ a distance function on $X$ . Define $\mathcal{T}_{d}\subseteq 2^{X}$ by setting $U\in\mathcal{T}$ if and only if for all $x\in U$ there exists some $\varepsilon>0$ such that $B_{\varepsilon}(x)=\{y\in X\mid d(x,y)<\varepsilon\}\subseteq U$ .

Many topological spaces are defined by metrics, i.e., symmetric definite distance functions for which the triangle inequality holds. For a distance function to define a (potentially non-metrizable) topology though, no additional assumptions are necessary:

Lemma 4.1.

If $d$ is a distance function on $X$ , then $\mathcal{T}_{d}$ is a topology on $X$ .

Proof.

Firstly, we show that $\mathcal{T}$ is closed under unions. So let $\mathcal{U}\subseteq\mathcal{T}$ . We will show that $\bigcup\mathcal{U}\in\mathcal{T}$ . Let $x\in\bigcup\mathcal{U}$ . Then, by definition of the set union, there exists some $U\in\mathcal{U}$ such that $x\in U$ . But since $U\in\mathcal{T}$ , there exists some $\varepsilon>0$ such that

[TABLE]

which shows that $\bigcup\mathcal{U}\in\mathcal{T}$ .

Secondly, we show that $\mathcal{T}$ is closed under finite intersections. Let $U_{1},U_{2},\dots,U_{k}\in\mathcal{T}$ . We will show that $\bigcap_{\ell=1}^{k}U_{\ell}\in\mathcal{T}$ . Let $x\in\bigcap_{\ell=1}^{k}U_{\ell}$ . Then, by definition of the set intersection, $x\in U_{\ell}$ for all $1\leq\ell\leq k$ . Because all $U_{\ell}$ are in $\mathcal{T}$ , there exist $\varepsilon_{1},\varepsilon_{2},\dots,\varepsilon_{k}>0$ such that $B_{\varepsilon_{\ell}}(x)\subseteq U_{\ell}$ for all $1\leq\ell\leq k$ . If we set $\varepsilon=\min\{\varepsilon_{1},\varepsilon_{2},\dots,\varepsilon_{k}\}$ , then $\varepsilon>0$ . Since we have $B_{\gamma}(x)\subseteq B_{\delta}(x)$ whenever $\gamma\leq\delta$ , we also have

[TABLE]

for all $1\leq\ell\leq k$ . But this shows that $B_{\varepsilon}(x)\subseteq\bigcap_{\ell=1}^{k}U_{\ell}$ , which means that $\bigcap_{\ell=1}^{k}U_{\ell}\in\mathcal{T}$ .

Since it is easy to check that $\emptyset,X\in\mathcal{T}$ as well, $\mathcal{T}$ is indeed a topology. ∎

We will henceforth refer to $\mathcal{T}_{d}$ as the topology induced by $d$ .

An execution is a sequence of configurations, i.e., an element of the product space $\mathcal{C}^{\omega}$ . The product topology on a product space $\Pi_{\iota\in I}X_{\iota}$ of topological spaces is defined as the coarsest topology such that all projections $\pi_{i}:\Pi_{\iota\in I}X_{\iota}\to X_{i}$ are continuous. It turns out that the product topology on the space $\mathcal{C}^{\omega}$ is induced by a distance function whose form is known in a special case that covers our needs:

Lemma 4.2.

Let $d$ be a distance function on $X$ that only takes the values [math] or $1$ . Then the product topology of $X^{\omega}$ , where every copy of $X$ is endowed with the topology induced by $d$ , is induced by the distance function

[TABLE]

where $\gamma=(C^{t})_{t\geq 0}$ and $\delta=(D^{t})_{t\geq 0}$ .

Proof.

We first show that all projections $\pi^{t}:X^{\omega}\to X$ are continuous when endowing $X^{\omega}$ with the product topology $\mathcal{T}^{\omega}$ : Let $U\subseteq X$ be open and $C\in U$ , i.e., $d(C,D)=0$ implies $D\in U$ . Let $\gamma=(C^{t})_{t\geq 0}\in(\pi^{t})^{-1}[U]$ and set $\varepsilon=2^{-t}$ . Then,

[TABLE]

where the last inclusion follows from the openness of $U$ . Since $(\pi^{t})^{-1}[U]$ is hence open in $\mathcal{T}^{\omega}$ , the continuity of $\pi^{t}$ follows.

Let now $\mathcal{T}_{0}$ be an arbitrary topology on $X^{\omega}$ for which all projections $\pi^{t}$ are continuous. We will show that $\mathcal{T}^{\omega}\subseteq\mathcal{T}_{0}$ , which reveals that $\mathcal{T}^{\omega}$ is the coarsest topology with continuous projections, i.e., the product topology of $X^{\omega}$ where every copy of $X$ is endowed by $\mathcal{T}_{d}$ . This will establish our lemma.

So let $E\in\mathcal{T}^{\omega}$ and take any $\gamma=(C^{t})_{t\geq 0}\in E$ . There exists some $\varepsilon>0$ such that $B_{\varepsilon}(\gamma)\subseteq E$ . Choose $t\in\mathbb{N}_{0}$ such that $2^{-t}\leq\varepsilon$ , and set

[TABLE]

Then, $F$ is open with respect to $\mathcal{T}_{0}$ as a finite intersection of open sets: After all, every $(\pi^{s})^{-1}\big{[}B_{1}(C^{s})\big{]}$ is open by the continuity of the projection $\pi^{s}$ . But since $F\subseteq B_{\varepsilon}(\gamma)\subseteq E$ , this shows that $E$ contains a $\mathcal{T}_{0}$ -open neighborhood for each of its points, i.e., $E\in\mathcal{T}_{0}$ . ∎

4.1. Uniform topology for executions

In previous work on point-set topology in distributed computing (Now10:master), the set of configurations $\mathcal{C}$ of some fixed algorithm $\mathcal{A}$ was endowed with the discrete topology, induced by the discrete metric $d_{\max}(C,D)=1$ if $C\neq D$ and [math] otherwise (for configurations $C,D\in\mathcal{C}$ ). Moreover, $\mathcal{C}^{\omega}$ was endowed with the corresponding product topology, which is induced by the common-prefix metric

[TABLE]

where $\gamma=(C^{t})_{t\geq 0}$ and $\delta=(D^{t})_{t\geq 0}$ , according to Lemma 4.2. Informally, $d_{\max}(\gamma,\delta)$ decreases with the length of the common prefix where no process can distinguish $\gamma$ and $\delta$ .

By contrast, we define the $p$ -view distance function $d_{p}$ on the set $\mathcal{C}$ of configurations for every process $p\in\Pi$ by

[TABLE]

Extending this distance function from configurations to executions, we define the $p$ -view pseudo-metric by

[TABLE]

where $\gamma=(C^{t})_{t\geq 0}$ and $\delta=(D^{t})_{t\geq 0}$ . Note that two executions, where $p$ has the same local view in all configurations in $\gamma$ and $\delta$ before $p$ fails in some round $t\geq 1$ , satisfy $d_{p}(\gamma,\delta)=2^{-t}$ .

Figure 2 shows an example of different instances of the (pseudo-)metrics introduced so far.

The following lemma shows that the $p$ -view pseudo-metric indeed deserves its name:

Theorem 4.3 Properties of $p$ -view pseudo-metric.

thm:Pseudometricproperties The $p$ -view pseudo-metric $d_{p}(\alpha,\beta)$ on $\mathcal{C}^{\omega}$ satisfies

[TABLE]

Despite of the lack of definiteness, most properties of metric spaces, including compactness, hold also in pseudo-metric spaces (Fre14). What is obviously lost is the uniqueness of the limit of a convergent sequence of executions, however: if $\alpha_{k}\to\hat{\alpha}$ and $d_{p}(\hat{\alpha},\hat{\beta})=0$ , then $\alpha_{k}\to\hat{\beta}$ as well.

The uniform minimum topology (abbreviated uniform topology) on the set $\Sigma$ of executions is induced by the distance function

[TABLE]

Note that $d_{\mathrm{u}}$ is only a pseudo-semi-metric, i.e., only satisfies symmetry and nonnegativity but not the triangle inequality: There may be sequences with $d_{p}(\alpha,\beta)=0$ and $d_{q}(\beta,\gamma)=0$ but $d_{r}(\alpha,\gamma)>0$ for all $r\in\Pi$ . Hence, the topology on $\mathcal{C}^{\omega}$ induced by $d_{\mathrm{u}}$ lacks many of the properties of (pseudo-)metric spaces, but will turn out to be sufficient for the characterization of the possibility/impossibility of uniform consensus (see Theorem 5.1).

The next lemma shows that the decision function of an algorithm that solves uniform consensus is always continuous with respect to both any $p$ -view and the uniform topology.

Lemma 4.4.

Let $\Delta:\Sigma\to\mathcal{V}$ be the consensus decision function of a uniform consensus algorithm. Then, $\Delta$ is continuous with respect to both the $p$ -view distance function $d_{p}$ , $p\in\Pi$ , and the uniform distance function $d_{\mathrm{u}}$ .

Proof.

We only prove the lemma for the uniform topology, by showing that $\Delta$ is locally constant, i.e., for all execution $\gamma\in\Sigma$ , there is some $d_{\mathrm{u}}$ -neighborhood $\mathcal{N}$ of $\gamma$ such that $\Delta$ is constant on $\mathcal{N}$ . The continuity for the $p$ -topologies follows since $d_{\mathrm{u}}$ is coarser than $d_{p}$ .

Let $T$ be a time greater than both the latest decision time of the processes in $Ob(\gamma)$ and the latest time any process becomes disobedient in execution $\gamma=(C^{t})_{t\geq 0}$ . By the Termination property and the fact that disobedient processes cannot become obedient again, we have $T<\infty$ . Because $T$ is larger than the latest time a process becomes disobedient, we have $Ob(\gamma)=Ob(C^{T})$ .

Using the notation $\gamma=(C^{t})_{t\geq 0}$ and $\delta=(D^{t})_{t\geq 0}$ , we choose the following neighborhood $\mathcal{N}$ of $\gamma$ :

[TABLE]

Let $\delta\in\mathcal{N}$ . Then $C^{T}\sim_{p}D^{T}$ for some $p\in Ob(C^{T})\cap Ob(D^{T})$ . Since $p$ has decided $\Delta(\gamma)$ at time $T$ in execution $\gamma$ and $p$ is obedient until time $T$ in execution $\delta$ , process $p$ has also decided $\Delta(\gamma)$ at time $T$ in execution $\delta$ . By Uniform Agreement and Termination, all processes in $Ob(\delta)$ decide $\Delta(\gamma)$ as well. In other words $\Delta(\delta)=\Delta(\gamma)$ , which concludes the proof. ∎

For an illustration in our non-communicating two-process example, denote by $\gamma^{(T)}$ the execution in which process $1$ has initial value [math], process $2$ has initial value $1$ , and process $1$ becomes disobedient at time $T$ . Similarly, denote by $\delta^{(U)}$ the execution with the same initial values and in which process $2$ becomes disobedient at time $U$ . Since there is no means of communication between the two processes, by Validity, each obedient process necessarily has to eventually decide on its own initial value, i.e., $\Delta(\gamma^{(T)})=1$ and $\Delta(\delta^{(T)})=0$ . The uniform distance between these executions is equal to $d_{\mathrm{u}}(\gamma^{(T)},\delta^{(U)})=2^{-\max\{T,U\}}$ . Thus, every $\varepsilon$ -neighborhood of $\gamma^{(T)}$ contains execution $\delta^{(U)}$ if $2^{-U}<\varepsilon$ . The set of [math]-deciding executions is thus not open in the uniform topology. But this means that the algorithm’s decision function $\Delta$ cannot be continuous. Lemma 4.4 hence implies that there is no uniform consensus algorithm in the non-communicating two-process model, which is in accordance with the application example in Section 9.2.

4.2. Non-uniform topology for executions

Whereas the $p$ -view pseudo-metric given by Eq. 8 is also adequate for non-uniform consensus, this is not the case for the uniform pseudo-semi-metric as defined in Eq. 9. The appropriate non-uniform minimum topology (abbreviated non-uniform topology) on the set $\Sigma$ of executions is induced by the distance function

[TABLE]

The non-uniform topology is finer than the uniform topology, since the minimum is taken over the smaller set $Ob(\gamma)\cap Ob(\delta)\subseteq\Pi$ , which means that $d_{\mathrm{u}}(\gamma,\delta)\leq d_{\mathrm{nu}}(\gamma,\delta)$ . In particular, this implies that every decision function that is continuous with respect to the uniform topology is also continuous with respect to the non-uniform topology. Of course, this also follows from Lemma 4.4 and the fact that every uniform consensus algorithm also solves non-uniform consensus.

The following Lemma 4.5 is the analog of Lemma 4.4:

Lemma 4.5.

Let $\Delta:\Sigma\to\mathcal{V}$ be the consensus decision function of a non-uniform consensus algorithm. Then $\Delta$ is continuous with respect to both the $p$ -view distance function $d_{p}$ , $p\in\Pi$ , and the non-uniform distance function $d_{\mathrm{nu}}$ .

Proof.

We again prove the lemma only for $d_{\mathrm{nu}}$ , by showing that $\Delta$ is locally constant, i.e., for all execution $\gamma\in\Sigma$ , there is some $d_{\mathrm{nu}}$ -neighborhood $\mathcal{N}$ of $\gamma$ such that $\Delta$ is constant on $\mathcal{N}$ ; the proof for $d_{p}$ is similar.

Let $T$ be the latest decision time of the processes in $Ob(\gamma)$ in execution $\gamma$ . By the Termination property, we have $T<\infty$ . Using the notation $\gamma=(C^{t})_{t\geq 0}$ and $\delta=(D^{t})_{t\geq 0}$ , we choose the following neighborhood $\mathcal{N}$ of $\gamma$ :

[TABLE]

If $\delta\in\mathcal{N}$ , then $C^{T}\sim_{p}D^{T}$ for some $p\in Ob(\gamma)\cap Ob(\delta)$ . Denote by $T_{p}$ the decision time of process $p$ in $\gamma$ . Since $T_{p}\leq T$ , we also have $C^{T_{p}}\sim_{p}D^{T_{p}}$ But this means that process $p$ decides value $\Delta(\gamma)$ at time $T_{p}$ in both executions $\gamma$ and $\delta$ , hence $\Delta(\delta)=\Delta(\gamma)$ . ∎

For an illustration in the non-communicating two-process example used in Section 4.1, note that the trivial algorithm that immediately decides on its initial value satisfies $\Delta(\gamma^{(T)})=1$ and $\Delta(\delta^{(U)})=0$ . The algorithm does solve non-uniform consensus, since it is guaranteed that one of the processes eventually becomes disobedient. In contrast to the uniform distance function, the non-uniform distance function satisfies $d_{\mathrm{nu}}(\gamma^{(T)},\delta^{(U)})=1$ since $Ob(\gamma^{(T)})\cap Ob(\delta^{(U)})=\emptyset$ . This means that the minimum distance between a [math]-deciding and a $1$ -deciding execution is at least $1$ . It is hence possible to separate the two sets of executions by sets that are open in the non-uniform topology, so consensus is solvable here, according to the considerations in the following section, see also Section 9.2.

5. General Consensus Characterization for Full-Information Executions

In this section, we will provide our topological conditions for uniform and non-uniform consensus solvability.

Call an execution $v$ -valent if all initial values in the execution are equal to $v$ .

Theorem 5.1 Characterization of uniform consensus.

Uniform consensus is solvable if and only if there exists a partition of the set $\Sigma$ of admissible executions into sets $\Sigma_{v}$ , $v\in\mathcal{V}$ , such that the following holds:

(1)

Every $\Sigma_{v}$ is an open set in $\Sigma$ with respect to the uniform topology induced by $d_{\mathrm{u}}$ . 2. (2)

If execution $\gamma\in\Sigma$ is $v$ -valent, then $\gamma\in\Sigma_{v}$ .

Proof.

( $\Rightarrow$ ): Define $\Sigma_{v}=\Delta^{-1}[\{v\}]$ where $\Delta$ is the decision function of a uniform consensus algorithm. This is a partition of $\Sigma$ by Termination, and Validity implies property (2). It thus only remains to show openness of the $\Sigma_{v}$ , which follows from the continuity of $\Delta:\Sigma\to\mathcal{V}$ , since every singleton $\{v\}$ is open in the discrete topology.

( $\Leftarrow$ ): We define a uniform consensus algorithm by defining the decision functions $\Delta_{p}:\mathcal{C}\to\mathcal{V}\cup\{\perp\}$ as

[TABLE]

where we use the notation $\delta=(D^{t})_{t\geq 0}$ . The function $\Delta$ is well defined since the sets $\Sigma_{v}$ are pairwise disjoint.

We first show Termination of the resulting algorithm. Let $\gamma\in\Sigma$ , let $v\in\mathcal{V}$ such that $\gamma\in\Sigma_{v}$ , and let $p\in Ob(\gamma)$ . Since $\Sigma_{v}$ is open with respect to the uniform topology, there exists some $\varepsilon>0$ such that $\left\{\delta\in\Sigma\mid d_{\mathrm{u}}(\gamma,\delta)<\varepsilon\right\}\subseteq\Sigma_{v}$ . By definition of $d_{\mathrm{u}}$ , we have $d_{\mathrm{u}}(\gamma,\delta)\leq d_{p}(\gamma,\delta)$ and hence $\left\{\delta\in\Sigma\mid d_{p}(\gamma,\delta)<\varepsilon\right\}\subseteq\left\{\delta\in\Sigma\mid d_{\mathrm{u}}(\gamma,\delta)<\varepsilon\right\}\subseteq\Sigma_{v}$ .

Writing $\gamma=(C^{t})_{t\geq 0}$ , let $T$ be the smallest integer such that $2^{-\chi_{p}(C^{t})}\leq\varepsilon$ for all $t\geq T$ . Such a $T$ exists since $\chi_{p}(C^{t})\to\infty$ as $t\to\infty$ . Then, for every $t\geq T$ , we have $\left\{\delta\in\Sigma\mid\exists s\colon C^{t}\sim_{p}D^{s}\right\}\subseteq\left\{\delta\in\Sigma\mid d_{p}(\gamma,\delta)<2^{-\chi_{p}(C^{t})}\right\}\subseteq\Sigma_{v}$ . In particular, $\Delta_{p}(C^{t})=v$ for all $t\geq T$ , i.e., process $p$ decides value $v$ in execution $\gamma$ .

We next show Uniform Agreement. For the sake of a contradiction, assume that process $q$ decides value $w\neq v$ in configuration $C$ in execution $\gamma\in\Sigma_{v}$ . But then, by definition of the function $\Delta_{q}$ , we have $\gamma\in\left\{\delta\in\Sigma\mid\exists t\colon C\sim_{p}D^{t}\right\}\subseteq\Sigma_{w}$ . But this is impossible since $\Sigma_{v}\cap\Sigma_{w}=\emptyset$ .

Validity immediately follows from property (2). ∎

Theorem 5.2 Characterization of non-uniform consensus.

Non-uniform consensus is solvable if and only if there exists a partition of the set $\Sigma$ of admissible executions into sets $\Sigma_{v}$ , $v\in\mathcal{V}$ , such that the following holds:

(1)

Every $\Sigma_{v}$ is an open set in $\Sigma$ with respect to the non-uniform topology induced by $d_{\mathrm{nu}}$ . 2. (2)

If execution $\gamma\in\Sigma$ is $v$ -valent, then $\gamma\in\Sigma_{v}$ .

Proof.

The proof is similar to that of Theorem 5.1, except that the definition of $\Delta_{p}$ is

[TABLE]

i.e., we just have to add the constraint that $p\in Ob(\delta)$ to the executions considered in the proof. ∎

If $\Sigma$ has only finitely many connected components, these characterizations give rise to the following meta-procedure for determining whether consensus is solvable and constructing an algorithm if it is. It requires knowledge of the connected components of the space $\Sigma$ of admissible executions with respect to the appropriate topology:

(1)

Initially, start with an empty set $\Sigma$ for every value $v\in\mathcal{V}_{O}$ . 2. (2)

Add to $\Sigma_{v}$ the connected components of $\Sigma$ that contain an execution with a $v$ -alent initial configuration. 3. (3)

Add all remaining connected components of $\Sigma$ to an arbitrarily chosen set $\Sigma_{v}$ . 4. (4)

If the sets $\Sigma_{v}$ are pairwise disjoint, then consensus is solvable. In this case, the sets $\Sigma$ determine a consensus algorithm via the universal algorithm given in the proofs of Theorem 5.1 and Theorem 5.2. If the $\Sigma_{v}$ are not pairwise disjoint, then consensus is not solvable.

6. Process-Time Graphs

Up to now, we have formalized our topological results in terms of admissible executions of the generic system model introduced in Section 3. In this section, we will show that they also hold for topological spaces consisting of other objects, namely, process-time graphs. In a nutshell, a process-time graph describes the process scheduling and all communication occurring in a run, along with the set of initial values. Compared to executions, process-time graphs have a much more succinct description, and will cause the resulting space to be compact. This, in turn, will allow us to prove additional topological results like LABEL:thm:setdistance.

Nevertheless, since we consider deterministic algorithms only, a process-time graph corresponds to a unique execution (and vice versa). This equivalence, which actually results from a transition function that is continuous in all our topologies (see LABEL:lem:tau:is:cont), will eventually allow us to use our topological reasoning in either space alike.

In order to define process-time graphs as generic as possible, we will resort to an intermediate operational system model that is essentially equivalent to the very flexible general system model from Moses and Rajsbaum (MR02). Crucially, it will also instantiate the weak clock functions $\chi_{p}(C^{t})$ stipulated in our generic model in Section 3, which must satisfy $\chi_{p}(C^{t})\leq t$ in every admissible execution $(C^{t})_{t\geq 0}\in\Sigma$ . Since $t$ represents some global notion of time here (called global real time in the sequel), ensuring this property is sometimes not trivial. More concretely, whereas $t$ is inherently known at every process in the case of lock-step synchronous systems like dynamic networks under message adversaries (WSS19:DC), for example, this is not the case for purely asynchronous systems (FLP85).

6.1. Basic operational system model

Following Moses and Rajsbaum (MR02), we consider message passing or shared memory distributed systems made up of a set $\Pi$ of $n\geq 2$ processes. We stipulate a global discrete clock with values taken from $\mathbb{N}_{0}=\mathbb{N}\cup\{0\}$ , which represents global real time in multiples of some arbitrary unit time. Depending on the particular distributed computing model, this global clock may or may not be accessible to the processes.

Processes are modeled as communicating state machines that encode a deterministic distributed algorithm (protocol) $\mathcal{P}$ . At every real time time $t\in\mathbb{N}_{0}$ , process $p$ is in some local state $L_{p}^{t}\in\mathcal{L}_{p}\cup\{\bot_{p}\}$ , where $\bot_{p}\not\in\mathcal{L}_{p}$ is a special state representing that process $p$ has failed.555This failed state $\bot_{p}$ is the only essential difference to the model of Moses and Rajsbaum (MR02), where faults are implicitly caused by a deviation from the protocol. This assumption makes sense for constructing “permutation layers”, for example, where it is not the environment that crashes a process at will, but rather the layer construction, which implies that some process takes only finitely many steps. Such a process just remains in the local state reached after its last computing step. In our setting, however, the fault state of all processes is solely controlled by the omniscient environment. Hence, we can safely use a failed state $\bot_{p}$ to gain simplicity without losing expressive power.

Local state transitions of $p$ are caused by local actions taken from the set $\text{{ACT}}_{p}$ , which may be internal bookkeeping operations and/or the initiation of shared memory operations resp. of sending messages; their exact semantics may vary from model to model. Note that a single action may consist of finitely many non-zero time operations, which are initiated simultaneously but may complete at different times. The deterministic protocol $\mathcal{P}_{p}:\mathcal{L}_{p}\to\text{{ACT}}_{p}$ , representing $p$ ’s part in $\mathcal{P}$ , is a function that specifies the local action $p$ is ready to perform when in state $L_{p}\in\mathcal{L}_{p}$ . We do not restrict the actions $p$ can perform when in state $\bot_{p}$ .

In addition, there is an additional non-deterministic state machine called the environment $\epsilon$ , which represents the adversary that is responsible for actions outside the sphere of control of the processes’ protocols. It

controls things like the completion of shared memory operations initiated earlier resp. the delivery of previously sent messages, the occurrence of process and communication failures, and (optionally) the occurrence of external environment events that can be used for modeling oracle inputs like failure detectors (CT96). Let $\text{{act}}_{\epsilon}$ be the set of all possible combinations of such environment actions (also called events for conciseness later on). We assume that the environment keeps track of pending shared-memory operations resp. sent messages in its environment state $L_{\epsilon}\in\mathcal{L}_{\epsilon}$ . The environment is also in charge of process scheduling, i.e., determines when a process performs a state transition, which will be referred to as taking a step. Formally, we assume that the set $\text{{ACT}}_{\epsilon}$ of all possible environment actions consists of all pairs $(\text{{Sched}},e)$ , made up of the set of processes $\text{{Sched}}\subseteq\Pi$ that take a step and some $e\in\text{{act}}_{\epsilon}$ (which may both be empty as well). The non-deterministic environment protocol $\mathcal{P}_{\epsilon}\subseteq\mathcal{G}\times(\text{{ACT}}_{\epsilon}\times\mathcal{L}_{\epsilon})$ is an arbitrary relation that, given the current global state $G\in\mathcal{G}$ (defined below, which also contains the current environment state $L_{\epsilon}\in\mathcal{L}_{\epsilon})$ , chooses the next environment action $E=(\text{{Sched}},e)\in\text{{ACT}}_{\epsilon}$ and the successor environment state $L_{\epsilon}^{\prime}\in\mathcal{L}_{\epsilon}$ . Note carefully that we assume that only $E$ is actually chosen non-deterministically by $\mathcal{P}_{\epsilon}$ , whereas $L_{\epsilon}^{\prime}$ is determined by a transition function $\tau_{\epsilon}:\mathcal{G}\times\text{{ACT}}_{\epsilon}\to\mathcal{L}_{\epsilon}$ according to $L_{\epsilon}^{\prime}=\tau_{\epsilon}(G,E)$ .

Finally, a global state of our system (simply called state) is an element of $\mathcal{G}=\mathcal{L}_{\epsilon}\times\mathcal{L}_{1}\times\cdots\times\mathcal{L}_{n}$ . Given a global state $G\in\mathcal{G}$ , $G_{i}$ denotes the local state of process $i$ in $G$ , and $G_{\epsilon}$ denotes the state of the environment in $G$ . Recall that it is $G_{\epsilon}$ that keeps track of in-transit (i.e., just sent)

messages, pending shared-memory operations etc.666A different, but equivalent, conceptual model would be to assume that the state of a processor consist of a visible state and, in the case of message passing, message buffers that hold in-transit messages. We also write $G=(G_{\epsilon},C)$ , where the vector of the local states $C=(C_{1},\dots,C_{n})=(G_{1},\dots,G_{n})$ of all the processes is called configuration. Given $C$ , the component $C_{i}$ denotes the local state of process $i$ in $C$ , and the set of all possible configurations is denoted as $\mathcal{C}$ . Note carefully that there may be global configurations $G\neq G^{\prime}$ where the corresponding configurations satisfy $C=C^{\prime}$ , e.g., in the case of different in-transit messages.

A joint action is a pair $(E,A)$ , where $E=(\text{{Sched}},e)\in\text{{ACT}}_{\epsilon}$ , and $A$ is a vector with index set Sched such that $A_{p}\in\text{{ACT}}_{p}$ for $p\in\text{{Sched}}$ . When the joint action $E$ is applied to global state $G$ where process $p$ is in local state $G_{p}$ , then $A_{p}=\mathcal{P}_{p}(G_{p})$ is the action prescribed by $p$ ’s protocol. Note that some environment actions, like message receptions at process $p$ require $p\in\text{{Sched}}$ , i.e., “wake-up” the process. For example, a joint action $(E,A)$ that causes $p$ to send a message $m$ to $q$ and process $r$ to receive a message $m^{\prime}$ sent to it by process $s$ earlier, typically works as follows: (i) $p$ is caused to take a step, where its protocol $\mathcal{P}_{p}$ initiates the sending of $m$ ; (ii) the environment adds $m$ to the send buffer of the communication channel from $p$ to $q$ (maintained in the environment state $L_{\epsilon}$ ); (iii) the environment moves $m^{\prime}$ from the send buffer of the communication channel from $s$ to $r$ (maintained in the environment state $L_{\epsilon}$ ) to the receive buffer (maintained in the local state of $r$ ), and (iv) causes $r$ to take a step. It follows that the local state $L_{r}$ of process $r$ reflects the content of message $m^{\prime}$ immediately after the step scheduled along with the message reception.

With ACT denoting the set of all possible joint actions, the transition function $\tau:\mathcal{G}\times\text{{ACT}}\to\mathcal{G}$ describes the evolution of the global state $G$ after application of the joint action $(E,A)$ , which results in the successor state $G^{\prime}=\tau(G,(E,A))$ . A run of $\mathcal{P}$ is an infinite sequence of global states $G^{0},G^{1},G^{2},\dots$ generated by an infinite sequence of joint actions. In order to guarantee a stable global state at integer times, we conceptually assume that the joint actions occur at times $0.5,1.5,2.5,\dots$ , i.e., that $G^{t+1}=\tau(G^{t},(E^{t.5},A^{t.5}))$ . $G^{0}$ is the initial state of the run, taken from the set of possible initial states $\mathcal{G}^{0}$ . Finally, $\Psi$ denotes the subset of all admissible runs of our system. $\Psi$ is typically used for enforcing liveness conditions like “every message sent to a correct process is eventually delivered” or “every correct process takes infinitely many steps”.

Unlike Moses and Rajsbaum (MR02), we handle process failures explicitly in the state of the processes, i.e., via the transition function: If some joint action $(E^{t.5},A^{t.5})$ contains $E^{t.5}=(\text{{Sched}},e)$ , where $e$ requests some process $p$ to fail, this will force $G_{p}^{t+1}=\bot_{p}$ in the successor state $G^{t+1}=\tau(G^{t},(E^{t.5},A^{t.5}))$ , irrespective of any other operations in $e$ (like the delivery of a message) that would otherwise affect $p$ . All process failures are persistent, that is, we require that all subsequent environment actions $E^{t^{\prime}.5}$ for $t^{\prime}\geq t$ also request $p$ to fail. As a convention, we consider every $E^{t^{\prime}.5}$ where $p$ fails as $p$ taking a step as well. Depending on the type of process failure, failing may cause $p$ to stop its protocol-compliant internal computations, to drop all incoming messages, and/or to stop sending further messages. In the case of crash failures, for example, the process may send a subset of the outgoing messages demanded by $\mathcal{P}_{p}$ in the very first failing step and does not perform any protocol-compliant actions in future steps. A send omission-faulty process does the same, except that it may send protocol-compliant messages to some processes also in future steps. A receive omission-faulty process may omit to process some of its received messages in every step where it fails, but sends protocol-compliant messages to every receiver. A general omission-faulty process combines the possible behaviors of send and receive omissions. Note that message loss can also be modeled in a different way in our setting: Rather than attributing an omission failure to the sender or receiver process, it can also be considered a communication failures caused by the environment. The involved sender process $p$ resp. receiver process $q$ continue to act according to its protocol in this case, i.e., would not enter the fault state $\bot_{p}$ resp. $\bot_{q}$ here.

Since we only consider deterministic protocols, a run $G^{0},G^{1},G^{2},\dots$ is uniquely determined by the initial configuration $C^{0}$ and the sequence of tuples $(L_{\epsilon}^{0},E^{0.5}),(L_{\epsilon}^{1},E^{1.5}),\dots$ consisting of tuples $(L_{\epsilon}^{t},E^{t.5})$ of environment state and environment actions for $t\geq 0$ . Let $\mathcal{G}^{\omega}$ resp. $\mathcal{C}^{\omega}$ be the set of all infinite runs resp. executions (configuration sequences), with $\Psi\subseteq\mathcal{G}^{\omega}$ resp. $\Sigma\subseteq\mathcal{C}^{\omega}$ denoting the set of admissible runs resp. admissible executions (resulting from admissible environment action sequences $E^{0.5},E^{1.5},\dots$ ).

Our assumptions on the environment protocol, namely, $L_{\epsilon}^{t+1}=\tau_{\epsilon}(G^{t},E^{t.5})$ , actually imply that a run $G^{0},G^{1},G^{2},\dots$ , and thus also the corresponding execution $C^{0},C^{1},C^{2},\dots$ , is already uniquely determined by the initial state $G^{0}=(L_{\epsilon}^{0},C^{0})$ and the sequence of chosen environment actions $E^{0.5},E^{1.5},\dots$ . Since $L_{\epsilon}^{0}$ is fixed and the environment actions abstract away almost all of the internal workings of the protocols and their complex internal states, it should be possible to uniquely describe the evolution of a run/execution just by means of the sequence $E^{0.5},E^{1.5},\dots$ . In the following, we will show that this is indeed the case.

6.2. Implementing global time satisfying the weak clock property

Our topological framework crucially relies on the ability to distinguish/not distinguish two local states $\alpha_{p}^{t}$ and $\beta_{p}^{t}$ in two executions $\alpha$ and $\beta$ at global real time $t$ . Clearly, this is easy for an omniscent observer who knows the corresponding global states and can thus verify that $\alpha_{p}^{t}$ and $\beta_{p}^{t}$ arise from the same global time $t$ . Processes cannot do that in asynchronous systems, however, since $t$ is not available to the processes and hence cannot be included in $\alpha_{p}^{t}$ and $\beta_{p}^{t}$ . Consequently, two different sequences of environment actions (called events in the sequel for conciseness) $E_{\alpha}^{0.5},E_{\alpha}^{1.5},\dots,E_{\alpha}^{(t-1).5}$ and $E_{\beta}^{0.5},E_{\beta}^{1.5},\dots,E_{\beta}^{(t^{\prime}-1).5}$ , applied to the same initial state, may produce the same state $\alpha_{p}^{t}=\beta_{p}^{t^{\prime}}$ . This happens when they are causal shuffles of each other, i.e., reorderings of the steps of the processes that are in accordance with the happens-before relation (Lam78). Hence, the (in)distinguishability of configurations does not necessarily match the (in)distinguishability of the corresponding event sequences.

Whereas our generic system model does not actually require processes to have a common notion of time, it does require that the weak clock functions $\chi_{p}$ do not progress faster than global real time. We will accomplish this in our operational system model by defining some alternative notion of global time that is accessible to the processes. Doing this will also rule out the problem spotted above, i.e., ensure that runs (event sequences) uniquely determine executions (configuration sequences).

There are many conceivable ways for defining global time, including the following possibilities:

(i) In the case of lock-step synchronous distributed systems, like dynamic networks under message adversaries (NSW19:PODC; WSM19:OPODIS; WSN21:FCT), nothing needs to be done here since all processes inherently know global real time $t$ .

(ii) In the case of asynchronous systems with a majority of correct processes, the arguably most popular approach for message-passing systems (see e.g. (MR01; ADGFT04; HMSZ08:TDSC)) is the simulation of asynchronous communication-closed rounds: Processes organize rounds $r=1,2,\dots$ by locally waiting until $n-f$ messages sent in the current round $r$ have been received. These $n-f$ messages are then processed, which defines both the local state at the beginning of the next round $r+1$ and the message sent to everybody in this next round. Late messages are discarded, and early messages are buffered locally (in the state of the environment) until the appropriate round is reached. Just using the round numbers as global time, i.e., choosing $t=r$ , is all that is needed for defining globla time in such a model.

(iii) In models without communication-closed rounds (FLP85; RS10:TCS), a suitable notion of global time can be derived from other777We note that both synchronous and asynchronous communication-closed rounds, as well as the executions $\mathcal{C}^{\omega}$ defined in our generic system model in Section 3, are of course also sequences of consistent cuts. definitions of consistent cuts (Mat89). We will show how this can be done in our operational system model based on Mattern’s vector clocks. Our construction will exploit the fact that a local state transition of a process happens only when it takes a step in our model: In between the $\ell$ th and $(\ell+1)$ th step of any fixed process $p$ , which happens at time $(t_{p}(\ell)-1).5$ and $(t_{p}(\ell+1)-1).5$ , respectively, only environment actions (external environment events, message deliveries, shared memory completions), if any, can happen, which change the state of the environment but not the local state of $p$ .

We will start out from the sequence of arbitrary cuts (Mat89) $IC^{0},IC^{1},IC^{2},\dots$ (indexed by an integer index $k\geq 0$ ) occurring in a given run $G^{0},G^{1},G^{2},\dots$ (which itself is indexed by the global real time $t$ ), where the frontier $IF^{k}$ of $IC^{k}$ is formed by the local states of the processes after they have taken their $k$ th step, i.e., $IF^{0}=IC^{0}=C^{0}$ and $IF^{k}=(G_{1}^{t_{1}(k)},\dots,G_{n}^{t_{n}(k)})$ for $k\geq 1$ , with $(t_{p}(k)-1).5$ being the time when process $p$ takes its $k$ th step. Note that the latter is applied to $p$ ’s state $IF_{p}^{k-1}$ in the frontier $IF^{k-1}$ of $IC^{k-1}$ and processes all the external environment events and all the messages received/shared memory operations completed since then. Recall the convention that every environment action where process $q$ fails is also considered as $q$ taking a step.

Clearly, except in lock-step synchronous systems, $t_{p}(k)\neq t_{q}(k)$ , so $IC^{0},IC^{1},IC^{2},\dots$ can be viewed as the result of applying a trivial “synchronic layering” in terms of Moses and Rajsbaum (MR02). Unfortunately, though, any $IC^{k}$ may be an inconsistent cut, as messages sent by a fast process $p$ in its $(k+1)$ th step may have been received by a slow process $q$ by its $k$ th step. $IC^{k}$ would violate causality in this case, i.e., would not be left-closed w.r.t. Lamport’s happens-before relation (Lam78).

Recall that we restricted our attention to consensus algorithms using full-information protocols, where every message sent contains the entire state transition history of the sender. As a consequence, we do not significantly lose applicability of our results by further restricting the protocol and the supported distributed computing models as follows:

(i)

In a single state transition of $\mathcal{P}_{p}$ , process $p$ , can

•

actually receive all messages delivered to it since its last step,

•

initiate the sending of at most one message to every process, resp.,

•

initiate at most one single-writer multiple-reader shared memory operation in the shared memory owned by some other process (but no restriction on operations in its own shared memory portion). 2. (ii)

In addition to (optional) external environment events, the environment protocol only provides

•

$\text{{fail}}(q)\in\text{{act}}_{\epsilon}$ , which tells process $q$ to fail,

•

$\text{{delv}}(q,p,t_{k})\in\text{{act}}_{\epsilon}$ , which identifies the message $m$ to be delivered to process $q$ (for reception in its next step) by the pair $(p,t_{k})$ , where $p$ is the sending process and $t_{k}.5$ is the time when the sending of $m$ has been initiated, resp.,

•

$\text{{done}}(q,p,t_{\ell},t_{k})\in\text{{act}}_{\epsilon}$ , which identifies the completed shared memory operation (to be processed in its next step), in the shared memory owned by $p$ , as the one initiated by process $q\neq p$ in its step at time $t_{\ell}.5$ ; in a read-type operation, it will return to $q$ the shared memory content based on $p$ ’s state at time $t_{k}$ , with $t_{\ell}\leq t_{k}$ .

In such a system, given any cut $IC^{k}$ , it is possible to determine the unique largest consistent cut $CC^{k}\subseteq IC^{k}$ (Mat89). By construction, $CC^{0}=IC^{0}$ , and the frontier $CF^{k}$ of $CC^{k}$ , $k\geq 1$ , consists of the local states of all processes $q\in\Pi$ reached by having taken some $\ell(q)$ th step, $0\leq\ell(q)\leq k$ , with at least one process $p$ having taken its $k$ th step, i.e., $\ell(p)=k$ and thus $CF^{k}_{p}=IF^{k}_{p}$ , and $CF^{k}_{q}=IF_{q}^{\ell(q)}$ with $0\leq\ell(q)\leq k$ for all processes $q$ . Note carefully that $\ell(q)<k$ happens when, in $IC^{k}$ , process $q$ receives some message/data initiated at some step $>k$ at or before its own $k$ th step but after its $\ell(q)$ th step.

Whereas the environment protocol could of course determine all the consistent cuts $CC^{0},CC^{1},CC^{2},\dots$ based on the corresponding sequence of global configurations, this is typically not the case for the processes (unless in the special case of a synchronous system). However, in distributed systems adhering to the above constraints, processes can obtain this knowledge (that is to say, their local share of a consistent cut) via vector clocks (Mat89). More specifically, it is possible to implement a vector clock $k_{p}=(k_{p}^{1},\dots,k_{p}^{n})$ at process $p$ , where $k_{p}^{p}$ counts the number of steps taken by $p$ itself so far, and $k_{p}^{q}$ , $q\neq p$ , gives the number of steps that $p$ knows that $q$ has taken so far. Vector clocks are maintained as follows: Initially, $k_{p}=(0,\dots,0)$ , and every message sent resp. every shared memory operation data written by $p$ gets $k_{p}$ as piggybacked information (after advancing $k_{p}^{p}$ ). At every local state transition in $p$ ’s protocol $P_{p}$ , $k_{p}^{p}$ is advanced by 1. Moreover, when a previously received message/previously read data value (containing the originating process $q$ ’s vector clock value $\hat{k}_{q}$ ) is to be processed in the step, $k_{p}$ is adjusted to the maximum of its previous value and $\hat{k}_{q}$ component-wise, i.e., $k_{p}^{q}=\max\{k_{p}^{q},\hat{k}_{q}^{q}\}$ for $q\neq p$ . Obviously, all this can be implemented transparently atop of any protocol $\mathcal{P}$ running in the system.

Now, given the sequence of global states $AC^{0},AC^{1},AC^{2},\dots$ of the processes running the so augmented protocol in some run $G^{0},G^{1},G^{2},\dots$ , there is a well-known algorithm for computing the maximal consistent cut $ACC^{k}$ for the non-consistent cut $AIC^{k}$ formed by the frontier $AIF^{k}$ of the local states of the processes after every process has taken its $k$ th step: Starting from $\ell:=k$ , process $p$ searches for the sought $\ell(p)$ by checking the vector clock value $k_{p}(\ell)$ of the state after its own $\ell$ th step. It stops searching and sets $\ell(p):=\ell$ iff $k_{p}(\ell)$ is less or equal to $(k,\dots,k)$ component-wise. The state $AIF_{p}^{\ell(p)}$ is then process $p$ ’s contribution in the frontier $ACF^{k}$ of the consistent cut $ACC^{k}$ . Clearly, from $ACC^{0},ACC^{1},ACC^{2},\dots$ , the sought sequence of the consistent cuts $CC^{0},CC^{1},CC^{2},\dots$ can be obtained trivially by discarding all vector clock information. Therefore, even the processes can compute their share, i.e., their local state, in $CC^{k}$ for every $k$ .

By construction, the sequence of consistent cuts $CC^{0},CC^{1},CC^{2},\dots$ , and hence the sequence of its frontiers $CF^{0},CF^{1},CF^{2},\dots$ , completely describe the evolution of the local states of the processes in a run $G^{0},G^{1},G^{2},\dots$ . In our operational model, we will hence just use the indices $k$ of $CC^{k}$ as global time for specifying executions: Starting from the initial state $CC^{0}$ , we consider $CC^{k}$ as the result of applying round $k\geq 1$ to $CC^{k-1}$ (as we did in the case of lock-step rounds).

6.3. Defining process-time graphs

No matter how consistent cuts, i.e., global time, is implemented, from now on, we just overload the notation used so far and denote by $C^{k}$ the frontier $CF^{k}$ in the consistent cut at global time $k$ . So given an infinite execution $\alpha$ , we again denote by $\alpha^{t}$ the $t$ th configuration (= the consistent cut with index $t$ ) in $\alpha$ .

Clearly, by construction, every $C^{k}$ is uniquely determined by $C^{0}$ and all the events that cause the steps leading to $C^{k}$ . In particular, we can define a vector of events $E^{k}$ , where $E_{p}^{k}$ is a set containing all the events that must be applied to $C_{p}^{k-1}$ in order to arrive at $C_{p}^{k}$ . Note carefully that a process $p$ that does not make a step, i.e., is not scheduled in $E^{k}$ and thus has the same non- $\bot_{p}$ state in $C^{k-1}$ and $C^{k}$ , does not have any event $\text{{delv}}(p,*,*)\in E_{p}^{k}$ (resp. $\text{{done}}(p,*,*)\in E_{p}^{k}$ ) by construction, i.e., $E_{p}^{k}=\emptyset$ . Otherwise, $E_{p}^{k}$ contains a “make a step” event, all (optional) external environment events, and $\text{{delv}}(p,*,*)$ for all messages that have been sent to $p$ in steps within $C^{k-1}$ and are delivered to $p$ after its previous step but before or at its $k$ th step (resp. $\text{{done}}(p,*,*,*)$ for all completed shared memory operation initiated by $p$ in steps within $CC^{k-1}$ and completed after $p$ ’s previous step but before or at its $k$ th step). Note that $E_{p}^{1}$ cannot contain any $\text{{delv}}(p,*,*)$ , as no messages have been sent before (resp. no $\text{{done}}(p,*,*,*)$ , as no shared memory operations have been initiated before).

As a consequence of our construction, the mismatch problem spotted at the beginning of Section 6.2 no longer exists, and we can reason about executions and the corresponding event sequences alike.

Rather than considering $C^{0}$ in conjunction with $E^{1},\dots,E^{k}$ , however, we will consider the corresponding process-time graph $k$ -prefix $PTG^{k}$ (BM14:JACM) instead, which we will now define. Since we are only interested in consensus algorithms here, we assume that every process has a dedicated initial state for every possible initial value $v$ , taken from a finite input domain $\mathcal{V}$ . For every assignment of initial values $x\in\mathcal{V}^{n}$ to the $n$ processes in the initial configuration $C^{0}$ , we inductively construct the following sequence of process-time graph prefixes $PTG^{t}$ :

Definition 6.1 (Process-time graph prefixes).

For every $k\geq 0$ , the process-time graph $k$ -prefix $PTG^{k}$ of a given run is defined as follows:

•

The process-time graph [math]-prefix $PTG^{0}$ contains the nodes $(p,0,x_{p})$ for all processes $p\in\Pi$ , with input value $x_{p}\in\mathcal{V}$ , and no edges.

•

The process-time graph $1$ -prefix $PTG^{1}$ contains the nodes $(p,0,x_{p})$ and $(p,1,f)$ for all processes $p\in\Pi$ , where $f=\bot$ if $\text{{fail}}(p)\in E^{1}$ (which models the case of an initially dead process (FLP85)), and $f=*$ otherwise, where $*$ is some encoding (e.g., some failure detector output) of the external environment events $\in E^{1}$ . It contains a (local) edge from $(p,0,x_{p})$ to $(p,1,f)$ and no other edges.

•

The process-time graph $k$ -prefix $PTG^{k}$ , $k\geq 2$ , contains $PTG^{k-1}$ and the nodes $(p,k,f)$ for all processes $p\in\Pi\setminus\{q\mid E_{q}^{k}=\emptyset\}$ , where $f=\bot$ if $\text{{fail}}(p)\in E^{k}$ , and $f=*$ otherwise. It contains a (local) edge from $(p,\ell,f_{\ell})$ to $(p,k,f)$ (if the latter node is present at all, i.e., when $E_{p}^{k}\neq\emptyset$ ), where $\ell$ is maximal among all nodes $(p,*,*)$ in $PTG^{k-1}$ . For message passing systems, it also contains an edge from $(p,s,f_{s})$ , $1\leq s<k$ , to $(q,k,f)$ iff $\text{{delv}}(q,p,s)\in E^{k}$ . For shared memory systems, it contains an edge from $(p,\ell,f_{\ell})$ , $1\leq\ell<k$ , to $(q,k,f)$ iff $\text{{done}}(q,p,s,\ell)\in E^{k}$ ; this reflects the fact that the returned data originate from $p$ ’s step $\ell$ and not from the step $s$ where $q$ has initiated the shared memory operation.

The round- $\ell$ process-time graph $PT^{\ell}$ , for $0\leq\ell\leq k$ , which represents the contribution of round $\ell$ to $PTG^{k}$ , is defined as (i) $PT^{0}=PTG^{0}$ and the set of vertices $PT^{\ell}=PTG^{\ell}\setminus PTG^{\ell-1}$ along with all their incoming edges (which all originate in $PTG^{\ell-1}$ ).

Figure 3 shows an example of a process-time graph prefix occuring in a run with lock-step synchronous or asynchronous rounds. The nodes are horizontally aligned according to global time, progressing along the vertical axis.

Figure 4 shows an example of a process-time graph prefix occuring in a run with processes that do not execute in lock-step rounds and may crash. Nodes are again horizontally aligned according to global time, progressing along the vertical axis. The frontier $C^{k}$ of the $k$ th consistent cut, reached at the end of round $k$ , is made up of $C_{p}^{k}=\{(p,\ell_{p}(k),*)\in PTG^{k}\mid\mbox{$ 0\leq\ell_{p}(k)\leq k $is maximal}\}$ . That is, starting from the (possibly inconsistent) cut made up of the nodes $(p,k,*)$ of all processes, one has to go down for process $p$ until the first node is reached where no edge originating in a node with time $>k$ has been received.

Let $\mathcal{PT}^{t}$ be the set of all possible process-time graph $t$ -prefixes, and $\mathcal{PT}^{\omega}$ be the set of all posible infinite process-time graphs, for all possible runs of our system. Note carefully that $\mathcal{PT}^{t}$ , as well every set $\mathcal{P}^{\ell}$ of round- $\ell$ process-time graphs for finite $\ell$ , is necessarily finite (provided the encoding ( $*$ ) for external environment events has a finite domain, which we assume). Clearly, $\mathcal{PT}^{t}$ resp. $\mathcal{PT}^{\omega}$ can be expressed as a finite resp. infinite sequence $(P^{0},\dots,P^{t})\in\mathcal{P}^{0}\times\mathcal{P}^{1}\times\dots\times\mathcal{P}^{t}=\mathcal{PT}^{t}$ resp. $(P^{0},P^{1},\dots)\in\mathcal{P}^{0}\times\mathcal{P}^{1}\times\dots=\mathcal{PT}^{\omega}$ of round- $\ell$ process time graphs.888Note that we slightly abuse the notation $\mathcal{PT}^{\omega}$ here, which normally represents $\mathcal{PT}\times\mathcal{PT}\times\dots$ .

We will denote by $PS\subseteq\mathcal{PT}^{\omega}$ the set of all admissible process-time graphs in the given model, and by $\Sigma\subseteq\mathcal{C}^{\omega}$ the corresponding set of admissible executions. Note carefully that process-time graphs are independent of the (decision function of the) consensus algorithm, albeit they do depend on the input values.

Due to the equivalence of process-time graphs and executions, our whole topological machinery developed in Section 4–Section 5 for $\Sigma\subseteq\mathcal{C}^{\omega}$ can also be applied to $PS\subseteq\mathcal{PT}^{\omega}$ . Since, in sharp contrast to the set of configurations $\mathcal{C}$ , the set of process-time graphs $\mathcal{PT}^{t}$ is finite for any time $t$ , Tychonoff’s theorem999Tychonoff’s theorem states that any product of compact spaces is compact (with respect to the product topology). implies compactness of the $p$ -view topology on $\mathcal{PT}^{\omega}$ .

Whereas this is not necessarily the case for $\mathcal{C}^{\omega}$ , we can prove compactness of the image of $\mathcal{PT}^{\omega}$ under an appropriately defined operational transition function: Given the original transition function $\tau_{\epsilon}:\mathcal{G}\times\text{{ACT}}_{\epsilon}\to\mathcal{L}_{\epsilon}$ , it is possible to define a PTG transition function $\hat{\tau}:\mathcal{PT}^{\omega}\to\mathcal{C}^{\omega}$ that just provides the (unique) execution for a given process-time graph. The following LABEL:lem:tau:is:cont shows that $\hat{\tau}$ is continuous in any of our topologies.

Lemma 6.2 Continuity of $\hat{\tau}$ .

lem:tau:is:cont For every $p\in\Pi$ , the PTG transition function $\hat{\tau}:\mathcal{PT}^{\omega}\to\mathcal{C}^{\omega}$ is continuous when both $\mathcal{PT}^{\omega}$ and $\mathcal{C}^{\omega}$ are endowed with any of $d_{p}$ , $p\in\Pi$ , $d_{\mathrm{u}}$ , $d_{\mathrm{nu}}$ .

Since the image of a compact space under a continuous function is compact, it hence follows that the set $\hat{\tau}[\mathcal{PT}^{\omega}]\subseteq\mathcal{C}^{\omega}$ of admissible executions is a compact subspace of $\mathcal{C}^{\omega}$ . The common structure of $\mathcal{PT}^{\omega}$ and its image under the PTG transition function $\hat{\tau}$ , implied by the continuity of $\tau$ , hence allows us to reason in either of these spaces. In particular, with Definition 6.3, the analog of Theorem 5.1 and Theorem 5.2 read as follows:

Definition 6.3 ( $v$ -valent process-time graph).

We call a process-time graph $z_{v}$ , for $v\in\mathcal{V}$ , $v$ -valent, if it starts from an initial configuration where all processes $p\in\Pi$ have the same input value $x_{p}=v$ .

Theorem 6.4 Characterization of uniform consensus.

Uniform consensus is solvable if and only if there exists a partition of the set $PS$ of admissible process-time graphs into sets $PS_{v}$ , $v\in\mathcal{V}$ , such that the following holds:

(1)

Every $PS_{v}$ is an open set in $PS$ with respect to the uniform topology induced by $d_{\mathrm{u}}$ . 2. (2)

If process-time graph $a\in PS$ is $v$ -valent, then $a\in PS_{v}$ .

Theorem 6.5 Characterization of non-uniform consensus.

Non-uniform consensus is solvable if and only if there exists a partition of the set $PS$ of admissible process-time graphs into sets $PS_{v}$ , $v\in\mathcal{V}$ , such that the following holds:

(1)

Every $PS_{v}$ is an open set in $PS$ with respect to the non-uniform topology induced by $d_{\mathrm{nu}}$ . 2. (2)

If process-time graph $a\in\Sigma$ is $v$ -valent, then $a\in PS_{v}$ .

7. Consensus Characterization in Terms of Broadcastability

We will now develop another characterization of consensus solvability, with rests on the broadcastability of the connected component $PS_{z_{v}}\subseteq PS$ that contains the $v$ -valent process-time graph $z_{v}\in PS$ .

Definition 7.1 (Diameter of a set).

For $A\subseteq\mathcal{PT}^{\omega}$ , depending on the distance function $d$ that induces the appropriate topology,

define $A$ ’s diameter as $d(A)=\sup\{d(a,b)\mid\mbox{$ a,b\in A $}\}$ .

Definition 7.2 (Broadcastability).

We call a subset $A\subseteq PS$ of admissible process-time graphs broadcastable by the broadcaster $p\in\Pi$ , if for every $a\in A$ there is some round $0<T(a)<\infty$ , by which every process $q\in Ob(a^{T(a)})$ that is still obedient in round $T(a)$ knows $p^{\prime}s$ input value $x_{p}$ in $a$ , denoted $x_{p}(a)$ , i.e., has $(p,0,x_{p}(a))$ in its view $V_{q}(a^{T(a)})$ .

We will now prove the essential fact that connected broadcastable sets have a diameter strictly smaller than $1$ :

Theorem 7.3 Diameter of broadcastable connected sets.

thm:broadcastablediameter If a connected set $A\subseteq PS$ of admissible process-time graphs is broadcastable by some process $p$ , then $d_{\mathrm{u}}(A)\leq d_{p}(A)\leq 1/2$ , as well as $d_{\mathrm{nu}}(A)\leq d_{p}(A)\leq 1/2$ , i.e., $p$ ’s input value satisfies $x_{p}(a)=x_{p}(b)$ for all $a,b\in A$ .

Corollary 7.4 follows immediately from LABEL:thm:broadcastablediameter:

Corollary 7.4 Diameter of broadcastable $PS_{z_{v}}$ .

If $PS_{z_{v}}$ for a $v$ -valent $z_{v}\in PS$ is broadcastable for $p$ , then $d_{\mathrm{u}}(PS_{z_{v}})\leq d_{p}(PS_{z_{v}})\leq 1/2$ , as well as $d_{\mathrm{nu}}(PS_{z_{v}})\leq d_{p}(PS_{z_{v}})\leq 1/2$ , since $p$ ’s input value $x_{p}(a)=v$ is the same for all $a\in PS_{z_{v}}$ .

We can now prove the following necessary and sufficient condition for solving consensus based on broadcastability:

Theorem 7.5 Consensus characterization via broadcastability.

thm:charbroadcastability A model allows to solve uniform resp. non-uniform consensus if and only if it guarantees that the connected components of the set $PS$ of admissible processes-time graphs in the uniform topology resp. the non-uniform topology are broadcastable for some process.

8. Limit-based Consensus Characterization

It is possible to shed some additional light on the consensus characterization by exploiting the fact that every $d_{p}$ , $p\in\Pi$ , is a pseudo-metric (unlike $d_{\mathrm{u}}$ and $d_{\mathrm{nu}}$ ): Since most of the convenient properties of metric spaces, including sequential compactness, also hold in pseudo-metric spaces, we can further explore the border of the decision sets $PS_{v}$ . It will turn out in Corollary 8.6 that consensus is impossible if and only if certain limit points in the appropriate topologies are admissible. Note that we will prove our results only for $d_{p}$ and uniform consensus; literally the same proofs apply for non-uniform consensus as well.

For a given consensus algorithm, we consider the set of all admissible process-time graphs $PS$ resp. the corresponding set of admissible executions $\Gamma$ endowed with the subspace topology generated by $\mathcal{PT}^{\omega}\cap PS$ resp. $\Gamma$ with the subspace topology101010Whenever we state a topological property w.r.t. the subspace topology, we will refer to $\Gamma$ (resp. $PS$ ), otherwise to $\mathcal{C}^{\omega}$ (resp. $\mathcal{PT}^{\omega}$ ). $\mathcal{C}^{\omega}\cap\Gamma$ in the $p$ -view topology. Recall that $PS$ and $\Gamma$ are not closed in general, hence not compact, even though $\mathcal{PT}^{\omega}$ and $\hat{\tau}(\mathcal{PT}^{\omega})$ are compact, recall LABEL:lem:tau:is:cont.

Definition 8.1 (Distance of sets).

For $A,B\subseteq\mathcal{PT}^{\omega}$ with distance function $d$ , let $d(A,B)=\inf\{d(a,b)\mid\mbox{$ a\in A $,$ b\in B $}\}$ .

We prove the following result, which also holds when $A$ , $B$ are not closed/compact. Corollary 8.3 shows that it also holds in the uniform and non-uniform topology.

Theorem 8.2 General set distance condition.

thm:setdistance Let $p\in\Pi$ and $A,B$ be arbitrary subsets of $\mathcal{PT}^{\omega}$ . Then, $d_{p}(A,B)=0$ if and only if there are infinite sequences $(a_{k})\in A^{\omega}$ and $(b_{k})\in B^{\omega}$ of process-time graphs, as well as $\hat{a},\hat{b}\in\mathcal{PT}^{\omega}$ with $a_{k}\to\hat{a}$ and $b_{k}\to\hat{b}$ (with respect to the appropriate $p$ -view topology) with $d_{p}(\hat{a},\hat{b})=0$ .

Eq. 9 and Eq. 11 allow us to extend this result from $p$ -view-topologies to the uniform and non-uniform topologies:

Corollary 8.3.

Let $A,B$ be arbitrary subsets of $\mathcal{PT}^{\omega}$ . Then $d_{\mathrm{u}}(A,B)=0$ resp. $d_{\mathrm{nu}}(A,B)=0$ if and only if there are infinite sequences $(a_{k})\in A^{\omega}$ and $(b_{k})\in B^{\omega}$ of process-time graphs as well as $\hat{a},\hat{b}\in\mathcal{PT}^{\omega}$ with $a_{k}\to\hat{a}$ and $b_{k}\to\hat{b}$ (with respect to the appropriate minimum topology) and $d_{\mathrm{u}}(\hat{a},\hat{b})=0$ resp. $d_{\mathrm{nu}}(\hat{a},\hat{b})=0$ .

Proof.

The proof of Theorem LABEL:thm:setdistance can be carried over literally, by using the fact that every convergent infinite sequence $(a^{t})$ w.r.t. $d_{\mathrm{u}}$ has a convergent infinite subsequence w.r.t. some (obedient) $d_{p}$ by the pigeonhole principle since $n$ is finite. ∎

The above LABEL:thm:setdistance allows us to distinguish 3 main cases that cause $d_{p}(A,B)=0$ : (i) If $\hat{a}\in A\cap B\neq\emptyset$ , one can choose the sequences defined by $a_{k}=b_{k}=\hat{a}=\hat{b}$ , $k\geq 1$ . (ii) If $A\cap B=\emptyset$ and $\hat{a}=\hat{b}$ , there is a “fair” process-time graph as the common limit. (iii) If $A\cap B=\emptyset$ and $\hat{a}\neq\hat{b}$ , there is a pair of “unfair” process-time graphs acting as limits, which have distance 0 (and are hence also common limits w.r.t. the pseudo-metric $d_{p}$ ). We note, however, that due to the non-uniqueness of the limits in our pseudo-metric, (iii) are actually two instances of (ii). We kept the distinction for compatibility with the existing results (FG11).

Definition 8.4 (Fair and unfair process-time graphs).

Consider two process-time graphs $r,r^{\prime}\in\mathcal{PT}^{\omega}$ of some consensus algorithm with decision sets $PS_{v}$ , $v\in\mathcal{V}$ , in any appropriate topology:

•

$r$ is called fair, if for some $v,w\neq v\in\mathcal{V}$ there are convergent sequences $(a_{k})\in PS_{v}$ and $(b_{k})\in PS_{w}$ with $a_{k}\to r$ and $b_{k}\to r$ .

•

$r$ , $r^{\prime}$ are called a pair of unfair process-time graphs, if for some $v,w\neq v\in\mathcal{V}$ there are convergent sequences $(a_{k})\in PS_{v}$ with $a_{k}\to r$ and $(b_{k})\in PS_{w}$ with $b_{k}\to r^{\prime}$ and $r$ and $r^{\prime}$ have distance 0.

An illustration is provided by Figure 6. Note carefully that, in the uniform case, a fair/unfair process-time graph $r$ where some process $p$ becomes disobedient in round $t$ implies that the same happens in all $a\in B_{2^{-t}}(r)\cap PS_{v}$ and $b\in B_{2^{-t}}(r)\cap PS_{w}$ . On the other hand, if $p$ does not fail in $r$ , it may still be the case that $p$ fails in every $a_{k}$ in the sequence converging to $r$ , at some time $t_{k}$ with $\lim_{k\to\infty}t_{k}=\infty$ . In the non-uniform case, neither of these possibilities exists: $p$ cannot fail in the limit $r$ , and any $a_{k}$ where $p$ fails is also excluded as its distance to any other sequence is 1.

The above findings go nicely with the alternative characterization of consensus solvability given in Corollary 8.6, which results from applying the following Lemma 8.5 from the textbook of Minkres (Munkres) to Theorem 5.1.

Lemma 8.5 Separation lemma (Munkres, Lemma 23.12).

If $Y$ is a subspace of $X$ , a separation of $Y$ is a pair of disjoint nonempty sets $A$ and $B$ whose union is $Y$ , neither of which contains a limit point of the other. The space $Y$ is connected if and only if there exists no separation of $Y$ .

Proof.

The closure of a set $A$ in $Y$ is $(\overline{A}\cap Y)$ , where $\overline{A}$ denotes the closure in $X$ . To show that $Y$ is not connected implies a separation, assume that $A,B$ are closed and open in $Y=A\cup B$ , so $A=(\overline{A}\cap Y)$ . Consequently, $\overline{A}\cap B=\overline{A}\cap(Y-A)=\overline{A}\cap Y-\overline{A}\cap A=\overline{A}\cap Y-A=\emptyset$ . Since $\overline{A}$ is the union of $A$ and its limit points, none of the latter is in $B$ . An analogous argument shows that none of the limit points of $B$ can be in $A$ .

Conversely, if $Y=A\cup B$ for disjoint non-empty sets $A$ , $B$ which do not contain limit points of each other, then $\overline{A}\cap B=\emptyset$ and $A\cap\overline{B}=\emptyset$ . From the equivalence above, we get $\overline{A}\cap Y=A$ and $\overline{B}\cap Y=B$ , so both $A$ and $B$ are closed in $Y$ and, as each others complement, also open in $Y$ as well. ∎

Corollary 8.6 Separation-based characterization.

Uniform resp. non-uniform consensus is solvable in a model generating the set of admissible process-time graph sequences $PS$ if and only if there exists a partition of $PS$ into sets $PS_{v},v\in\mathcal{V}$ such that the following holds:

(1)

No $PS_{v}$ contains a limit point of any other $PS_{w}$ w.r.t. the uniform resp. non-uniform topology in $\mathcal{PT}^{\omega}$ . 2. (2)

Every $v$ -valent admissible sequence $z_{v}$ satisfies $z_{v}\in PS_{v}$ .

We hence immediately obtain:

Corollary 8.7 Fair/unfair consensus impossibility.

The set of admissible process-time graphs $PS$ of a consensus algorithm $\mathcal{A}$ with decision sets $PS_{v}$ , $v\in\mathcal{V}$ , does not contain any fair process-time graph sequence $r$ or any pair $r,r^{\prime}$ of unfair process-time graph sequences.

Whereas we did not manage to characterize the set of limit sequences that had to be excluded in order to ensure consensus solvability, we can prove that, for any decision set $PS_{v}$ , it must be a compact set:

Lemma 8.8 Compactness of excluded sequences.

Let $PS_{v}$ , $v\in\mathcal{V}$ , be any decision set of a correct consensus algorithm, $\overline{PS}_{v}$ be its closure in $\mathcal{PT}^{\omega}$ and $\mbox{Int}(PS_{v})$ its interior. Then, $\hat{PS}_{v}=\overline{PS}_{v}-\mbox{Int}(PS_{v})$ , which is the set of to be excluded limit points, is compact.

Proof.

The closure $\overline{PS}_{v}$ is closed by definition. Since the complement of the interior $\mbox{Int}(PS_{v})^{C}$ is also closed by definition, it follows that $\hat{PS}_{v}=\overline{PS}_{v}-\mbox{Int}(PS_{v})=\overline{PS}_{v}\cap\mbox{Int}(PS_{v})^{C}$ is also closed. As a closed subset of the compact set $\overline{PS}_{v}$ , $\hat{PS}_{v}$ is hence compact. ∎

9. Applications

In this section, we will apply our topological characterizations of consensus solvability to several different examples. Apart from providing a topological explanation of bivalence proofs (Section 9.1) and folklore results for synchronous consensus under general omission faults (Section 9.2), we will provide a complete characterization of consensus solvability for dynamic networks with both closed (Section 9.3) and non-closed (Section 9.4) message adversaries. Finally, we will provide a consensus algorithm for asynchronous systems with weak timely links (Section 9.5), which does not rely on an implementation of the $\Omega$ failure detector.

9.1. Bivalence-based impossibilities

Our topological results shed some new light on the now standard technique of bivalence-based impossibility proofs introduced in the celebrated FLP paper (FLP85), which have been generalized (MR02) and used in many different contexts: Our results reveal that the forever bivalent executions constructed inductively in bivalence proofs (SW89; SWK09; BRSSW18:TCS; WSS19:DC) are just the common limit of two infinite sequence of executions $\alpha_{0},\alpha_{1},\dots$ all contained in, say, the decision set $\Sigma_{0}$ and $\beta_{0},\beta_{1},\dots$ all contained in $\Sigma_{1}$ that have a common limit $\alpha_{k}\to\hat{\alpha}$ and $\beta_{k}\to\hat{\beta}$ in some $p$ -view topology with $d_{p}(\hat{\alpha},\hat{\beta})=0$ .

More specifically, what is common to these proofs is that one shows that, for any consensus algorithm, there is an admissible forever bivalent run. This is usually done inductively, by showing that there is a bivalent initial configuration and that, given a bivalent configuration $C^{t-1}$ at the end of round $t-1$ , there is a 1-round extension leading to a bivalent configuration $C^{t}$ at the end of round $t$ . By definition, bivalence of $C^{t}$ means that there are two admissible executions $\alpha_{t}$ with decision value 0 and $\beta_{t}$ with decision value 1 starting out from $C^{t}$ , i.e., having a common prefix that leads to $C^{t}$ . Consequently, their distance, in any $p$ -view topology, satisfies $d_{p}(\alpha_{t},\beta_{t})<2^{-t}$ . Note that this is also true for the more general concept of a bipotent configuration $C^{t}$ , as introduced by Moses and Rajsbaum (MR02).

By construction, the $(t-1)$ -prefix of $\alpha_{t}$ and $\alpha_{t-1}$ are the same, for all $t$ , which implies that they converge to a limit $\hat{\alpha}$ (and analogously for $\hat{\beta}$ ), see Figure 6 for an illustration. Therefore, these executions match Definition 8.4, and Corollary 8.7 implies that the stipulated consensus algorithm cannot be correct. Concrete examples are the lossy link impossibility (SW89), i.e., the impossibility of consensus under an oblivious message adversary for $n=2$ that may choose any graph out of the set $\{\leftarrow,\leftrightarrow,\rightarrow\}$ , and the impossibility of solving consensus with vertex-stable source components with insufficient stability interval (BRSSW18:TCS; WSS19:DC). In the case of the oblivious lossy link message adversary using the reduced set $\{\leftarrow,\rightarrow\}$ considered by Coulouma, Godard, and Peters (CGP15), consensus is solvable and there is no forever bivalent run. Indeed, there exists a consensus algorithm where all configurations reached after the first round are already univalent.

9.2. Consensus in synchronous systems with general omission process faults

As a more elaborate example of systems where the solvability of non-uniform and uniform consensus may be different, we take synchronous systems with up to $f$ general omission process failures (PT86). For $n\geq f+1$ , non-uniform consensus can be solved in $f+1$ rounds, whereas solving uniform consensus requires $n\geq 2f+1$ . Note that these systems also cover the examples in Section 4.

The impossibility proof for $n\leq 2f$ uses a standard partitioning argument, splitting $\Pi$ into a set $P_{0}$ of processes with $|P_{0}|=f$ and $P_{1}$ with $|P_{1}|=n-f\leq f$ . One considers an admissible process-time graph $\alpha_{0}$ where all processes $p\in\Pi$ start with $x_{p}=0$ , the ones in $P_{0}$ are correct, and the ones in $P_{1}$ are initially dead; the decision value of the processes in $P_{0}$ must be 0 by validity. Similarly, $\alpha_{1}$ starts from $x_{p}=1$ , all processes in $P_{1}$ are correct and the ones in $P_{0}$ are initially dead; the decision value is hence 1. For another process-time graph $\alpha$ , where the processes in $P_{1}$ are correct and the ones in $P_{0}$ are general omission faulty, in the sense that every $p\in P_{0}$ does not send and receive any message to/from $P_{1}$ , one observes $\alpha\sim_{p_{0}}\alpha_{0}$ for every $p_{0}\in P_{0}$ , as well as $\alpha\sim_{p_{1}}\alpha_{1}$ for every $p_{1}\in P_{1}$ . Hence, $p_{0}$ and $p_{1}$ decide on different values in $\alpha$ .

Topologically, this is equivalent to $d_{\mathrm{u}}(\alpha,\alpha_{0})=0$ as well as $d_{\mathrm{u}}(\alpha,\alpha_{1})=0$ , which implies $\alpha\in PS_{0}$ as well as $\alpha\in PS_{1}$ . Consequently, $PS_{0}$ and $PS_{1}$ cannot be disjoint, as needed for uniform consensus solvability. Clearly, for $n\geq 2f+1$ , this argument is no longer applicable. And indeed, algorithms like the one proposed by Parvedy and Raynal (PR03:IPDPS) can be used for solving uniform consensus.

If one revisits the topological equivalent of the above partitioning argument for $n\leq 2f$ in the non-uniform case, it turns out that still $d_{\mathrm{nu}}(\alpha,\alpha_{0})=0$ , but $d_{\mathrm{nu}}(\alpha,\alpha_{1})=1$ as all processes in $P_{1}$ are faulty. Consequently, $\alpha\not\in PS_{1}$ . So $PS_{0}$ and $PS_{1}$ could partition the space of admissible executions. And indeed, non-uniform consensus can be solved in $f+1$ rounds here. In order to demonstrate this by means of our Theorem 5.2, we will sketch how the required decision sets $PS_{v}$ can be constructed. We will do so by means of a simple labeling algorithm, which assigns a decision value $v\in\mathcal{V}$ to every admissible process-time graph $\sigma$ . Recall that synchronous systems are particularly easy to model in our setting, since we can use the number of rounds as our global time $t$ .

Clearly, every process that omits to send its state in some round to a (still) correct processor is revealed to every other (still) correct processor at the next round at the latest. This implies that every correct process $p$ seen by some correct process $q$ by the end of the $f+1$ -round prefix $\sigma|_{f+1}$ in the admissible process-time graph $\sigma$ has also been seen by every other correct process during $\sigma|_{f+1}$ as well, since one would need a chain of $f+1$ different faulty processes for propagating $p$ ’s state to $q$ otherwise. Thus, $p$ must have managed to broadcast $(p,0,x_{p}(\sigma))$ to all correct processes during $\sigma|_{f+1}$ .

Consequently, if $\sigma|_{f+1}\sim\rho|_{f+1}$ , then they must have the same set of broadcasters. Our labeling algorithm hence just assigns to $\sigma$ the initial value $x_{p}$ of the, say, lexically smallest broadcaster $p$ in $\sigma_{f+1}$ . The resulting decision sets are trivially open since, for every $\sigma\in PS_{v}$ , we have $B_{2^{-(f+1)}}(\sigma)\subseteq PS_{v}$ as well. The generic non-uniform consensus algorithm from Theorem 5.2 can hence be used to solve consensus.

9.3. Dynamic networks with limit-closed message adversaries

In this section, we consider dynamic networks under message adversaries (like oblivious ones (SW89; CGP15)) that are limit-closed (WSM19:OPODIS), in the sense that every convergent sequence of process-time graphs $a_{0},a_{1},\dots$ with $a_{i}\in PS$ for every $i$ has a limit $\hat{a}\in PS$ . Note that processes do not fail in such systems, i.e., are all obedient, such that we will only consider the $p$ -view and the uniform topology here. An illustration is shown in Fig. 5, where the blue dots represent the $a_{i}$ ’s and $\times$ the limit point $\hat{a}$ at the boundary.

In this case, the set of admissible process-time graph sequences $PS$ is closed and hence a compact subspace both in any $P$ -view topology and in the minimum topology. Moreover, we obtain:

Corollary 9.1 Decision sets for compact MAs are compact.

For every correct consensus algorithm for a compact message adversary and every $v\in\mathcal{V}$ , $PS_{v}$ is closed in $PS$ and compact, and $d_{p}(PS_{v},PS_{w})>0$ for any $v,w\neq v\in\mathcal{V}$ and $p\in\Pi$ , and hence also $d_{\mathrm{u}}(PS_{v},PS_{w})>0$ .

Moreover, there are only finitely many different connected components $PS_{x}$ , $x\in PS$ , which are all compact, and for every $x,y$ with $PS_{x}\neq PS_{y}$ , it holds that $d_{p}(PS_{x},PS_{y})>0$ and hence also $d_{\mathrm{u}}(PS_{x},PS_{y})>0$ .

Proof.

Since all decision sets $PS_{v}$ are closed in $PS$ by Theorem 5.1 and $PS$ is compact for a compact message adversary, it follows that every $PS_{v}$ is also compact. From Theorem LABEL:thm:setdistance it hence follows that $d_{p}(PS_{v},PS_{w})>0$ . As this holds for every $p\in\Pi$ , we also have $d_{\mathrm{u}}(PS_{v},PS_{w})>0$ .

Since every connected component $PS_{x}$ of $PS$ that contains $x$ is closed in $PS$ , as the closure of a connected subspace is also connected (Munkres, Lemma 23.4) and a connected component is maximal, the same arguments as above also apply to $PS_{x}$ . To show that there are only finitely many different $PS_{z_{v}}$ for $v$ -valent sequences $z_{v}\in PS$ , observe that $PS_{v}=\bigcup_{z_{v}\in PS}PS_{z_{v}}$ is an open covering of $PS_{v}$ . Since the latter is compact, there is a finite sub-covering $PS_{v}=PS_{z_{v}^{1}}\cup\dots\cup PS_{z_{v}^{m}}$ , and all other $PS_{z_{v}}$ for a $v$ -valent $z_{v}$ must be equal to one of those, as connected components are either disjoint or identical. ∎

We now make the abstract characterization of Theorem 5.1 and our meta-procedure more operational, by introducing the $\varepsilon$ -approximation of the connected component $PS_{z}$ that contains a process-time graph $z\in PS$ , typically for some $\varepsilon=2^{-t}$ , $t\geq 0$ . It is constructed iteratively, using finitely many iterations (since the number of different possible $t$ -prefixes satisfies $|\mathcal{PT}^{t}|<\infty$ ) of the following algorithm:

Definition 9.2 ( $\varepsilon$ -approximations).

Let $z\in PS$ be an admissible process-time graph. In the minimum topology, we iteratively define $PS_{z}^{\varepsilon}$ , for $\varepsilon>0$ , as follows: $PS_{z}^{\varepsilon}[0]=\{z\}$ ; for $\ell>0$ , $PS_{z}^{\varepsilon}[\ell]=\bigcup_{a\in PS_{z}^{\varepsilon}[\ell-1]}(B_{\varepsilon}(a)\cap PS)$ ; and $PS_{z}^{\varepsilon}=PS_{z}^{\varepsilon}[m]$ where $m<\infty$ is such that $PS_{z}^{\varepsilon}[m]=PS_{z}^{\varepsilon}[m+1]$ . For $v\in\mathcal{V}$ , the $\varepsilon$ -approximation $PS_{v}^{\varepsilon}$ is defined as $PS_{v}^{\varepsilon}=\bigcup_{z_{v}\in PS}PS_{z_{v}}^{\varepsilon}$ , where every $z_{v}$ denotes a $v$ -valent process-time graph.

Lemma 9.3 Properties of $\varepsilon$ -approximation.

For every $\varepsilon>0$ , every $v,w\in\mathcal{V}$ , every $z\in PS$ , $v$ -valent $z_{v}$ , and every $w$ -valent $z_{w}$ , the $\varepsilon$ -approximations have the following properties:

(i)

For a closed message adversary, there are only finitely many different $PS_{z}^{\varepsilon}$ , $z\in PS$ . 2. (ii)

For every $0<\varepsilon^{\prime}\leq\varepsilon$ , it holds that $PS_{z_{v}}^{\varepsilon^{\prime}}\subseteq PS_{z_{v}}^{\varepsilon}$ . 3. (iii)

$PS_{z_{v}}^{\varepsilon}\cap PS_{z_{w}}^{\varepsilon}\neq\emptyset$ * implies $PS_{z_{v}}^{\varepsilon}=PS_{z_{w}}^{\varepsilon}$ .* 4. (iv)

$PS_{z}\subseteq PS_{z}^{\varepsilon}$ .

Proof.

Properties (ii)–(iv) hold for arbitrary message adversaries: To prove (ii), it suffices to mention $B_{\varepsilon^{\prime}}(a)\subseteq B_{\varepsilon}(a)$ . As for (iii), if $a\in PS_{x}^{\varepsilon}\cap PS_{y}^{\varepsilon}\neq 0$ , the iterative construction of $PS_{x}^{\varepsilon}$ would reach $a$ , which would cause it to also include the whole $PS_{y}^{\varepsilon}$ , as the latter also reaches $a$ . If (iv) would not hold, $PS_{z}$ could be separated into disjoint open sets, which contradicts connectivity. Finally, (i) holds for closed message adversaries, since Corollary 9.1 implies that there are only finitely many different connected components $PS_{z}$ , which carry over to $PS_{z}^{\varepsilon}$ by (iii) and (iv). ∎

We now show that $PS_{x}^{\varepsilon}$ and $PS_{y}^{\varepsilon}$ for sequences $x$ and $y$ with $PS_{x}\neq PS_{y}$ have a distance $>0$ , provided $\varepsilon$ is sufficiently small:

Lemma 9.4 Separation of $\varepsilon$ -approximations for compact MAs.

For a compact message adversary that allows to solve consensus, let $x\in PS$ and $y\in PS$ be such that $PS_{x}\neq PS_{y}$ . Then there is some $\varepsilon>0$ such that, for any $0<\varepsilon^{\prime}\leq\varepsilon$ , it holds that $d_{\mathrm{u}}(PS_{x}^{\varepsilon^{\prime}},PS_{y}^{\varepsilon^{\prime}})>0$ .

Proof.

According to Corollary 9.1, the components $PS_{x}$ and $PS_{y}$ are compact. Theorem LABEL:thm:setdistance reveals that we have $d_{\mathrm{u}}(PS_{x},PS_{y})=d>0$ . By Lemma 9.3.(iv), for every $\varepsilon>0$ , $PS_{x}\subseteq PS_{x}^{\varepsilon}$ and $PS_{y}\subseteq PS_{y}^{\varepsilon}$ . Therefore, setting $\varepsilon<d/2$ secures $d_{\mathrm{u}}(PS_{x}^{\varepsilon},PS_{y}^{\varepsilon})>0$ . ∎

We immediately get the following corollary, which allows us to reformulate Theorem LABEL:thm:charbroadcastability as given in Theorem 9.6.

Corollary 9.5 Matching $\varepsilon$ -approximation.

For a compact message adversary, if $\varepsilon>0$ is chosen in accordance with Lemma 9.4, then $PS_{z}^{\varepsilon}=PS_{z}$ for every $z\in PS$ .

Theorem 9.6 Consensus characterization for compact MAs.

A compact message adversary allows to solve consensus if and only if there is some $\varepsilon>0$ such that every $v$ -valent $PS_{z_{v}}^{\varepsilon}$ , $v\in\mathcal{V}$ , is broadcastable for some process.

Proof.

Our theorem follows from Theorem LABEL:thm:charbroadcastability in conjunction with Corollary 9.5. ∎

It follows that if consensus is solvable, then, for every $0<\varepsilon^{\prime}\leq\varepsilon$ , the universal algorithm from Theorem 5.1 with $PS_{v}=PS_{v}^{\varepsilon^{\prime}}\cup PS\setminus\bigcup_{w\neq v\in\mathcal{V}}PS_{w}^{\varepsilon^{\prime}}$ for some arbitrary value $v\in\mathcal{V}$ , and $PS_{w}=PS_{w}^{\varepsilon^{\prime}}$ for the remaining $w\in\mathcal{V}$ , can be used. And indeed, the consensus algorithm given by Winkler, Schmid, and Moses (WSM19:OPODIS, Alg. 1) can be viewed as an instantiation of it.

Moreover, Corollary 9.5 implies that checking the broadcastability of $PS_{z_{v}}^{\varepsilon}$ can be done by checking the broadcastability of finite prefixes. More specifically, like the decision function $\Delta$ of consensus, the function $T(a)$ that gives the round by which every process in $a\in PS$ has $(p,0,x_{p}(a))$ of the broadcaster $p$ in its view is locally constant for a sufficiently small neighborhood, namely, $B_{2^{-T(a)}}(a)$ , and is hence continuous in any of our topologies. Since $PS_{z_{v}}=PS_{z_{v}}^{\varepsilon}$ is compact, $T(a)$ is in fact uniformly continuous and hence attains its maximum $\hat{T}$ in $PS_{z_{v}}^{\varepsilon}$ . It hence suffices to check broadcastability in the $t$ -prefixes of $PS_{z_{v}}^{\varepsilon}$ for $t=\max\{\lfloor\log_{2}(1/\varepsilon)\rfloor,\hat{T}\}$ in Theorem 9.6. This has been translated into the following non-topological formulation (WSM19:OPODIS) (where $\text{Ker}(x)$ is the set of processes that reached all processes in the process-time graph $x$ and $[\sigma|_{r}]$ is the set of $r$ -prefixes of the process-time graphs in $PS_{\sigma}$ in the uniform topology):

Theorem 9.7 (WSM19:OPODIS, Thm. 1).

Consensus is solvable under a closed message adversary MA if and only if for each $\sigma\in\text{{MA}}$ there is a round $r$ such that $\bigcap_{x\in[\sigma|_{r}]}\text{Ker}(x)\neq\emptyset$ .

9.4. Dynamic networks with non-limit closed message adversaries

In this section, we finally consider message adversaries that are not limit-closed (FG11; Pfl18:master; WSS19:DC). Unfortunately, we cannot use the $\varepsilon$ -approximations according to Definition 9.2 here: Even if $\varepsilon$ is made arbitrarily small, Lemma 9.4 does not hold. An illustration is shown in Fig. 6. It is apparent that adding a ball $B_{\varepsilon}(a)$ in the iterative construction of $PS_{z}^{\varepsilon}$ , where $d_{\mathrm{u}}(a,r)<\varepsilon$ for some forbidden limit sequence $r$ , inevitably lets the construction grow into some $PS_{z^{\prime}}^{\varepsilon}$ where $z^{\prime}$ has a different valence than $z$ . Whereas this could be avoided by adapting $\varepsilon$ when coming close to $r$ , the resulting approximation does not provide any advantage over directly using our characterization theorem Theorem 5.1.

These topological results are of course in accordance with the results on non-limit closed message adversaries we are aware of. For example, the binary consensus algorithm for $n=2$ by Fevat and Godard (FG11) assumes that the algorithm knows a fair or a pair of unfair sequences a priori, which effectively partition the sequence space into two connected components.111111Note that there are uncountably many choices for separating $PS_{0}$ and $PS_{1}$ here, however.

The $(D+1)$ -VSRC message adversary $\lozenge\mbox{\footnotesize{{STABLE}}}_{n}(D+1)$ (WSS19:DC) generates process-time graphs that consist of single-rooted communication graphs in every round, with the additional guarantee that, eventually, a $D+1$ -vertex-stable root component ( $D+1$ -VSRC) occurs. Herein, a root component is a strongly connected component without in-edges from outside the component, and an $x$ -VSRC is a root component made up of the same set of processes in $x$ consecutive rounds. $D\leq n-1$ is the dynamic diameter of a VSRC, which ensures that all root members reach all processes.

It has been proved (WSS19:DC) that consensus is impossible with $\lozenge\mbox{\footnotesize{{STABLE}}}_{n}(x)$ for $x\leq D$ , whereas an algorithm exists for $\lozenge\mbox{\footnotesize{{STABLE}}}_{n}(D+1)$ . Obviously, $\lozenge\mbox{\footnotesize{{STABLE}}}_{n}(D+1)$ effectively excludes all sequences without any $D+1$ -VSRC. And indeed, the choice $x=D+1$ renders the connected components of $PS$ broadcastable by definition, which is in accordance with LABEL:thm:charbroadcastability.

We also introduced and proved correct an explicit labeling algorithm for $\lozenge\mbox{\footnotesize{{STABLE}}}_{n}(n)$ , which effectively operationalizes the universal consensus algorithm of Theorem 6.4 (WSN21:FCT): By assigning an (invariant) label $\Delta(\sigma^{\prime}|_{r})$ to the $r$ -prefixes of $\sigma\in PS$ , it effectively assigns a corresponding unique decision value $v\in\mathcal{V}$ to $\sigma$ , which in turn specifies the decision set $PS_{v}$ containing $\sigma$ . It is instructive to see how the requirement of every $PS_{v}$ being open (and closed) in Theorem 6.4 translates into a corresponding assumption on this labeling function:

Assumption 1 (WSN21:FCT, Assumpt. 1).

$\forall\sigma\in\text{{MA}}\enspace\exists r\in\mathbb{N}\enspace\forall\sigma^{\prime}\in\text{{MA}}\colon\sigma^{\prime}|_{r}\sim\sigma|_{r}\Rightarrow\Delta(\sigma^{\prime}|_{r})=\Delta(\sigma|_{r})\neq\emptyset\enspace.$ **

For $\lozenge\mbox{\footnotesize{{STABLE}}}_{n}(n)$ , it has been proved (WSN21:FCT, Thm. 12) that the given labeling algorithm satisfies this assumption for $r=r_{stab}+4n$ , where $r_{stab}$ is the round where the (first) $D+1$ -VSRC in $\sigma$ starts. Consensus is hence solvable by a suitable instantiation of the universal consensus algorithm of Theorem 6.4.

9.5. Consensus in systems with an eventually timely $f$ -source

It is well-known (DDS87) that consensus cannot be solved in distributed systems of $n\geq 2f+1$ (partially) synchronous processes, up to which $f$ may crash, which are connected by reliable asynchronous communication links. For solving consensus, the system model has been strengthened by a weak timely link (WTL) assumption (ADGFT04; HMSZ08:TDSC): there has to be at least one correct process $p$ that eventually sends timely to a sufficiently large subset of the processes.

In previous work (ADGFT04), at least one eventually timely $f$ -source $p$ was assumed: After some unknown initial period where all end-to-end message delays are arbitrary, every broadcast of $p$ is received by a fixed subset $P\subseteq\Pi$ with $p\in P,|P|\geq f+1$ within some possibly unknown maximum end-to-end delay $\Theta$ . The authors showed how to build the $\Omega$ failure detector in such a system, which, in conjunction with any $\Omega$ -based consensus algorithm like the one by Mostéfaoui and Raynal (MR01), allows to solve uniform consensus.

Their $\Omega$ implementation lets every process broadcast a heartbeat message every $\eta$ steps, which forms partially synchronized rounds, and maintains an accusation counter for every process $q$ that counts the number of rounds the heartbeats of which were not received timely by more than $f$ processes. This is done by letting every process who does not receive $q$ ’s broadcast within $\Theta$ send an accusation message for $q$ , and incrementing the accusation counter for $q$ if more than $f$ such accusation messages from different receivers came in. It is not difficult to see that the accusation counter of a process that crashes grows unboundedly, whereas the accusation counter of every timely $f$ -source eventually stops being incremented. Since the accusation counters of all processes are exchanged and agreed-upon as well, choosing the process with the smallest accusation counter (with ties broken by process ids) is a legitimate choice for the output of $\Omega$ .

This WTL model was further relaxed (HMSZ08:TDSC), which allows the set $P(k)$ of witnessing receivers of every eventually moving timely $f$ -source to depend on the sending round $k$ . The price to be paid for this relaxation is the need to incorporate the sender’s round number in the heartbeat and accusation messages.

In this subsection, we will use our Theorem 5.1 to prove topologically that consensus can indeed be solved in the WTL model: We will give and prove correct an explicit labeling algorithm Algorithm 1, which assigns a decision value $v\in\mathcal{V}$ to every process-time graph $\sigma$ that specifies the decision set $PS_{v}$ containing $\sigma$ . Applying our universal algorithm to these decision sets hence allows to solve consensus in this model. Obviously, unlike the existing algorithms, our algorithm does not rely on an implementation of $\Omega$ .

We assume a (slightly simplified121212It would not be difficult to extend our considerations for partially synchronous processes with unknown $\Theta$ , since global synchrony is not needed for our algorithm: A process $q$ only needs to be able to timeout the periodic heartbeat messages of process $p$ , in a way that eventually ensures a timeout larger than $\Theta$ . This pairwise timeout is easy to implement in the case of partially synchronous processes, by incrementing the timeout with every accusation of $p$ . We do not incorporate this feature to keep our presentation as simple as possible.) WTL model with synchronous processes and asynchronous links that are reliable and FIFO, with known $\Theta$ for timely links. Whereas we will use the time $t=0,1,2,\dots$ our synchronous processes take their steps as global time, we note that we do not have communication-closed rounds here, i.e., have to deal with general process-time graphs according to Definition 6.1. In an admissible execution $\sigma$ , we denote by $F(\sigma)$ the set of up to $f$ processes that crash in $\sigma$ , and $C(\sigma)=Ob(\sigma)=\Pi\setminus F(\sigma)$ the set of correct processes. For an eventual timely $f$ -source $p$ , we will denote with $r_{\text{stab},p}$ the stabilization round, by which it has already started to send timely: a message sent in round $t\geq r_{\text{stab},p}$ is received by every $q\in P(t)$ no later than in round $t+\Theta-1$ , hence is present in $q$ ’s state at time $t+\Theta-1$ . Note carefully that this is always satisfied when $q$ has crashed by that round. We again assume that the processes execute a full-history protocol, i.e., send their whole state in every round. For keeping the relation to the existing algorithms, we consider the state message sent by $p$ in round $t$ to be its $\text{{heartbeat}}(t)$ . Moreover, if the state of process $q$ at time $t+\Theta-1$ does not contain the reception $\text{{heartbeat}}(t)$ from process $p$ , we will say that $q$ broadcasts an accusation message $\text{{accusation}}(p,t)$ for round $t$ of $p$ in round $t+\Theta$ (which is of course just part of $q$ ’s state sent in this round). If $q$ crashes before round $t+\Theta$ , it will never broadcast $\text{{accusation}}(p,t)$ . If $q$ crashes exactly in round $t+\Theta$ , we can nevertheless assume that it either manages to send $\text{{accusation}}(p,t)$ to all correct processes in the system or to none: In our full information protocol, every process that receives $\text{{accusation}}(p,t)$ will forward this message to all other processes when it broadcasts its own state later on.

Definition 9.8 (WTL elementary state predicates and variables).

For process $s$ at time $r\geq 1$ , i.e., the end of round $r$ , we define the following predicates and state variables:

•

$\text{{accuse}}_{s}^{r}(p)=\text{{true}}$ if and only if $s$ did not receive $\text{{heartbeat}}(r-\Theta)$ from $p$ by time $r$ and thus sent $\text{{accusation}}(p,t)$ .

•

$\text{{nottimelyrec}}_{s}^{r}(q,p,t)=\text{{true}}$ if and only if $s$ recorded the reception of $\text{{accusation}}(p,t)$ from $q$ by time $r$ .

•

$\text{{nottimely}}_{s}^{r}(p,t)=\text{{true}}$ if and only if $\text{{nottimelyrec}}_{s}^{r}(q,p,t)=\text{{true}}$ for at least $n-f$ different $q\in\Pi$ .

•

$\text{{accusationcounter}}_{s}^{r}(p)=(|\{k\leq r:\text{{nottimely}}_{s}^{r}(p,k)=\text{{true}}\}|,p)$ .

•

$\text{{heardof}}_{s}^{r}(p)=|\{k\leq r:\mbox{$ s $received$ \text{{heartbeat}}(k) $from$ p $(directly or indirectly) by time$ r $}\}|$ .

Note that a process $q$ that crashes before time $t+\Theta$ causes $\text{{nottimelyrec}}_{s}^{r}(q,p,t)=\text{{false}}$ for all $r$ , and that $p$ is appended in $\text{{accusationcounter}}_{s}^{r}(p)$ for tie-breaking purposes only. For every eventually timely $f$ -source $p$ , the implicit forwarding of accusation messages ensures that $\text{{accusationcounter}}_{s}^{r}(p)$ will eventually be the same at every correct process $s$ in the limit $r\to\infty$ .

We now define some predicates that require knowledge of the execution $\sigma$ . Whereas they cannot be computed locally by the processes in the execution, they can be used in the labeling algorithm.

Definition 9.9 (WTL extended state predicates and variables).

Given an execution $\sigma$ , let the dominant eventual timely $f$ -source $p_{\sigma}$ be the one that leads to the unique smallest value of $\text{{accusationcounter}}_{s}^{\infty}(p_{\sigma})$ , which is the same at every process $s\in\Pi\setminus{F(\sigma)}$ . With $r_{\text{stab},\sigma}=r_{\text{stab},p_{\sigma}}$ denoting the stabilization time of the dominant eventual timely $f$ -source in $\sigma$ and $F(\sigma|_{r})\subseteq F(\sigma)$ the set of processes that crashed by time $r$ , we also define

•

$\text{{minheardof}}_{s}(\sigma,r)=\min_{p\in\Pi\setminus{F(\sigma|_{r})}}\text{{heardof}}_{s}^{r}(p)$ ,

•

$\text{{oldenough}}(\sigma,r)=\text{{true}}$ if and only if $\forall s\in\Pi\setminus{F(\sigma|_{r})}$ , both (i) $\text{{minheardof}}_{s}(\sigma,r)\geq r_{\text{stab},\sigma}+\Theta$ and (ii) $\forall p\in\Pi\setminus{p_{\sigma}}:\;\text{{accusationcounter}}_{s}^{r}(p_{\sigma})<\text{{accusationcounter}}_{s}^{r}(p)$ .

•

$\text{{mature}}(\sigma,r)=\text{{true}}$ if and only if $\exists r_{0}<r$ such that both (i) $\text{{oldenough}}(\sigma,r_{0})=\text{{true}}$ and (ii) $\forall s\in\Pi\setminus{F(\sigma|_{r})}:\;\text{{minheardof}}_{s}(\sigma,r)\geq r_{0}$ .

Note that it may occur that another eventual timely $f$ -source $p^{\prime}\neq p_{\sigma}$ in $\sigma$ has a smaller stabilization time $r_{\text{stab},p^{\prime}}<r_{\text{stab},p_{\sigma}}$ than the dominant one, which happens if $p^{\prime}$ causes more accusations than $p_{\sigma}$ before stabilization in total.

The following properties are almost immediate from the definitions:

Lemma 9.10 Properties of oldenough and mature.

The following properties hold for oldenough:

(i)

If $\text{{oldenough}}(\sigma,r)=\text{{true}}$ , then $\text{{accusationcounter}}_{s}^{t}(p_{\sigma})=\text{{accusationcounter}}_{s}^{r}(p_{\sigma})$ for every $s$ that did not crash by time $t\geq r$ . 2. (ii)

$\text{{oldenough}}(\sigma,r)$ * is stable, i.e., $\text{{oldenough}}(\sigma,r)=\text{{true}}\Rightarrow\text{{oldenough}}(\sigma,t)=\text{{true}}$ for $t\geq r$ .* 3. (iii)

(i) and (ii) also hold for $\text{{mature}}(\sigma,r)$ , and $\text{{mature}}(\sigma,r)=\text{{true}}\Rightarrow\text{{oldenough}}(\sigma,r)=\text{{true}}$ .

Proof.

Since $\text{{oldenough}}(\sigma,r)=\text{{true}}$ entails that every process $s\in\Pi\setminus{F(\sigma|_{r})}$ has received the accusation messages for all rounds up to $r_{\text{stab},\sigma}$ since $\text{{minheardof}}_{s}(\sigma,r)\geq r_{\text{stab},\sigma}+\Theta$ according to Definition 9.9, (i) follows. This also implies (ii), since the accusation counter of every process $p\neq p_{\sigma}$ can at most increase after time $r$ . That these properties carry over to mature is obvious from the definition. ∎

The following lemma proves that two executions $\sigma$ and $\rho$ with $\sigma|_{r}\sim_{s}\rho|_{r}$ cannot both satisfy $\text{{oldenough}}(\sigma,r)$ resp. $\text{{oldenough}}(\rho,r)$ , and, hence, $\text{{mature}}(\sigma,r)$ resp. $\text{{mature}}(\rho,r)$ , except when the dominant eventual timely $f$ -source is the same in $\sigma$ and $\rho$ :

Lemma 9.11.

Consider two executions $\sigma$ and $\rho$ with $\sigma|_{r}\sim_{s}\rho|_{r}$ for some process $s$ that is not faulty by round $r$ in both $\sigma$ and $\rho$ . Then,

[TABLE]

Proof.

Since $\text{{oldenough}}(\sigma,r)=\text{{true}}$ , it follows from Definition 9.9 that $\forall p\in\Pi\setminus{p_{\sigma}}:\text{{accusationcounter}}_{s}^{r}(p_{\sigma})<\text{{accusationcounter}}_{s}^{r}(p)$ . Analogously, $\text{{oldenough}}(\rho,r)=\text{{true}}$ implies $\forall p\in\Pi\setminus{p_{\rho}}:\text{{accusationcounter}}_{s}^{r}(p_{\rho})<\text{{accusationcounter}}_{s}^{r}(p)$ . Since $\sigma|_{r}\sim_{s}\rho|_{r}$ , this is only possible if $p_{\sigma}=p_{\rho}$ . ∎

Finally, we need the following technical lemmas:

Lemma 9.12 Indistinguishability precondition.

Suppose $\tau|_{r^{\prime}}\sim_{s^{\prime}}\sigma|_{r^{\prime}}$ is such that $s^{\prime}$ received a message from $s\neq s^{\prime}$ containing its state in the sending round $r_{0}^{\prime}\leq r^{\prime}-1$ by round $r^{\prime}$ in $\sigma|_{r^{\prime}}$ and hence also in $\tau|_{r^{\prime}}$ . Analogously, suppose $\sigma|_{r}\sim_{s}\rho|_{r}$ is such that $s$ received a message from $s^{\prime}$ containing its state in the sending round $r_{0}\leq r-1$ by round $r$ in $\sigma|_{r}$ and hence also in $\rho|_{r}$ . Then,

(i)

$\tau|_{r_{0}^{\prime}}\sim_{s}\sigma|_{r_{0}^{\prime}}$ , 2. (ii)

$\tau|_{\min\{r_{0}^{\prime},r\}}\sim_{s}\rho|_{\min\{r_{0}^{\prime},r\}}$ , 3. (iii)

$\sigma|_{r_{0}}\sim_{s^{\prime}}\rho|_{r_{0}}$ , 4. (iv)

$\tau|_{\min\{r_{0},r^{\prime}\}}\sim_{s^{\prime}}\rho|_{\min\{r_{0},r^{\prime}\}}$ .

Proof.

If (i) would not hold, since $s$ sends a message containing its state in round $r_{0}^{\prime}$ to $s^{\prime}$ both in $\tau|_{r^{\prime}}$ and in $\sigma|_{r^{\prime}}$ , these two states would be distinguishable for $s$ , which contradicts our assumption. The analogous argument proves (iii). Statement (ii) follows from combining (i) with $\sigma|_{r}\sim_{s}\rho|_{r}$ , (iv) follows from combining (iii) with $\tau|_{r^{\prime}}\sim_{s^{\prime}}\sigma|_{r^{\prime}}$ . ∎

Lemma 9.13 Heardof inheritance.

Suppose $\sigma|_{r}\sim_{s}\rho|_{r}$ and $\text{{minheardof}}_{s}(\rho,r)\geq r_{0}$ for some $1\leq r_{0}<r$ , as it arises in $\text{{mature}}(\rho,r)=\text{{true}}$ , for example. Then, $\forall p\in\Pi\setminus{F(\rho|_{r})}$ , it also holds in $\sigma|_{r}$ that $\text{{heardof}}_{s}^{r}(p)\geq r_{0}$ , but not necessarily $\text{{heardof}}_{s}^{r}(p^{\prime})\geq r_{0}$ for $p^{\prime}\in(\Pi\setminus{F(\sigma|_{r})})\cap F(\rho|_{r})$ . Consequently, it may happen that $\text{{minheardof}}_{s}(\sigma,r)<r_{0}$ .

Proof.

Since the state of $s$ is the same in $\sigma|_{r}$ and $\rho|_{r}$ , but the sets $F(\rho|_{r})$ and $F(\sigma|_{r})$ may be different, the lemma follows trivially. ∎

With the abbreviation $C(\sigma|_{r})=\Pi\setminus{F(\sigma|_{r})}$ for all non-faulty processes in $\sigma|_{r}$ , and $\sigma|_{r}\sim_{Q}\rho|_{r}$ for $\forall q\in Q:\;\sigma|_{r}\sim_{q}\rho|_{r}$ , we define the short-hand notation $\sigma|_{r}\sim_{\geq n-f}\rho|_{r}$ to express indistinguishability for a majority of (correct) processes, defined by $\exists Q\subseteq C(\sigma|_{r})\cap C(\rho|_{r}),|Q|\geq n-f$ such that $\forall q\in Q:\;\sigma|_{r}\sim_{q}\rho|_{r}$ .

The following lemma guarantees that prefixes that are indistinguishable only for strictly less than $n-f$ processes are eventually distinguishable for all processes:

Lemma 9.14 Vanishing minority indistinguishability.

Given $\rho|_{r_{0}}$ , there is a round $r$ , $r_{0}\leq r<\infty$ , such that for every $\sigma|_{r_{0}}$ with $\rho|_{r_{0}}\not\sim_{\geq n-f}\sigma|_{r_{0}}$ , it holds that $\rho|_{r}\not\sim\sigma|_{r}$ .

Proof.

Due to our reliable link assumption, for every process $s$ that does not fail in $\rho$ , there is a round $r>r_{0}$ where $\text{{minheardof}}_{s}(\rho,r)\geq r_{0}$ . Now assume that there is some $\sigma|_{r_{0}}$ with $\rho|_{r_{0}}\sim_{Q}\sigma|_{r_{0}}$ for a maximal set $Q$ with $1\leq|Q|<n-f$ , but $\rho|_{r}\sim_{s}\sigma|_{r}$ for some process $s$ . Since $s$ receives round- $r_{0}$ messages from $|C(\rho|_{n})|\geq n-f$ processes in $\rho|_{r}$ , and $\rho|_{r}\sim_{s}\sigma|_{r}$ , process $s$ must receive exactly the same messages also in $\sigma|_{r}$ . As at most $|Q|<n-f$ of those messages may be sent by processes that cannot distinguish $\rho|_{r_{0}}\sim_{Q}\sigma|_{r_{0}}$ , at least one such message must originate in a process $q^{\prime}$ with $\rho|_{r_{0}}\not\sim_{q^{\prime}}\sigma|_{r_{0}}$ . In this case, Lemma 9.12.(iii) prohibits $\rho|_{r}\sim_{s}\sigma|_{r}$ , however, which provides the required contradiction. ∎

The following lemma finally shows that majority indistinguishability in conjunction with mature prefixes entails strong indistinguishability properties in earlier rounds:

Lemma 9.15 Majority indistinguishability precondition.

Suppose $\tau|_{r}\sim_{\geq n-f}\sigma|_{r}\sim_{\geq n-f}\rho|_{r}$ and $\text{{mature}}(\rho,r)=\text{{true}}$ . Then, for the round $r_{0}$ imposed by the latter, it holds that $\tau|_{r_{0}}\sim_{C(\rho|_{r})}\sigma|_{r_{0}}\sim_{C(\rho|_{r})}\rho|_{r_{0}}$ , and hence also $\tau|_{r_{0}}\sim_{C(\rho|_{r})}\rho|_{r_{0}}$ .

Proof.

Let $S$ resp. $Q$ be the set of at least $n-f$ processes causing $\sigma|_{r}\sim_{\geq n-f}\rho|_{r}$ resp. $\sigma|_{r}\sim_{\geq n-f}\tau|_{r}$ . Since $Q\cap S\neq\emptyset$ by the pigeonhole principle, let $s\in Q\cap S$ . Clearly, $\tau|_{r}\sim_{s}\sigma|_{r}\sim_{s}\rho|_{r}$ , and hence also $\tau|_{r}\sim_{s}\rho|_{r}$ . Since $\text{{mature}}(\rho,r)=\text{{true}}$ , Lemma 9.12.(i) in conjunction with Lemma 9.13 implies $\rho|_{r_{0}}\sim_{C(\rho|_{r})}\sigma|_{r_{0}}$ , as well as $\rho|_{r_{0}}\sim_{C(\rho|_{r})}\tau|_{r_{0}}$ , and hence also $\sigma|_{r_{0}}\sim_{C(\rho|_{r})}\tau|_{r_{0}}$ as asserted. ∎

With these preparations, we can define an explicit labeling algorithm Algorithm 1 for the WTL model, i.e., an algorithm that computes a label $\Delta(\sigma|_{r})$ for every $r$ -prefix $\sigma|_{r}$ of an admissible execution $\sigma$ in our WTL model. A label can either be $\emptyset$ (still undefined) or else denote a single process $p$ (which will turn out to be a broadcaster), and will be consistent in $\sigma$ in the sense that $\Delta(\sigma|_{r})=p\Rightarrow\Delta(\sigma|_{r+k})=p$ for every $k\geq 0$ . Note that we can hence uniquely also assign a label $\Delta(\sigma)$ to an infinite execution. Note that, for defining our decision sets, we will assign $\sigma$ to $\Sigma_{x_{p}}$ , where $x_{p}$ is the initial value of $p=\Delta(\sigma)$ in $\sigma$ .

Informally, our labeling algorithm works as follows: If there is some unlabeled mature prefix $\rho|_{r}$ , it is labeled either (i) with the label of some already labeled but not yet mature $\sigma|_{r}$ if the latter got its label early enough, namely, by the round $r_{0}$ where $\text{{oldenough}}(\rho,r_{0})=\text{{true}}$ , or else (ii) with its dominant $p_{\rho}$ .

The following Theorem 9.16 shows that Algorithm 1 computes labels, which result in a partitioning that is compatible with the needs of Theorem 5.1. Consensus in the WTL model can hence be solved by means of our universal algorithm applied to the resulting decision sets.

Theorem 9.16 Decision set partition for WTL algorithm.

The set $PS(p)=\{\sigma\mid\Delta(\sigma)=p\}$ is open in the uniform topology, and so is the decision set $PS_{v}=\{\sigma\mid(\Delta(\sigma)=p)\wedge(x_{p}=v)\}$ .

Proof.

We show that, if $\sigma$ is assigned to the partition set $PS(p)$ , then $B_{2^{-(i+D(\sigma))}}(\sigma)\subseteq PS(p)$ , where $i$ is the smallest round where $\text{{mature}}(\sigma,i)=\text{{true}}$ and $D(\sigma)$ is the maximum number of rounds required for a minority indistinguishability in $\sigma_{i}$ to go away ( $D(\sigma)=r-r_{0}$ in the notation of Lemma 9.14), which implies openness of $PS(p)$ . Note that the corresponding property obviously also holds for the decision set $PS_{v}=\{\sigma\mid(\Delta(\sigma)=p)\wedge(x_{p}=v)\}$ .

First of all, in Algorithm 1, $\Delta(\sigma|_{i})$ gets initialized to $\emptyset$ in line 1 and assigns a label $\neq\emptyset$ at the latest when $\text{{mature}}(\sigma,i)=\text{{true}}$ . Once assigned, this value is never modified again as each assignment, except the one in line 1, may only be performed if the label was still $\emptyset$ .

For an unlabeled prefix $\sigma|_{i}$ that is indistinguishable to a mature labeled prefix $\rho|_{i}$ , there are two possibilities: Either, its indistinguishability is a majority one, in which case $\sigma|_{i}$ gets its label from $\rho|_{i}$ in line 1, or else the minority indistinguishability will go away within $D(\sigma)$ rounds. It thus suffices to show that if a label $\Delta(\rho|_{r})\leftarrow\{p\}$ is assigned to a round $r$ prefix $\rho|_{r}$ , then every majority-indistinguishable prefix $\sigma|_{r}\sim_{\geq n-f}\rho|_{r}$ has either $\Delta(\rho|_{r})=\Delta(\sigma|_{r})$ or $\Delta(\sigma|_{r})=\emptyset$ .

We prove this by induction on $r=0,1,\ldots$ . The base for $r=0$ follows directly from line 1. For the step from $r-1$ to $r$ , assume by hypothesis that, for all round $r-1$ prefixes that already had $\{p\}$ assigned, all their majority-indistinguishable prefixes have label $\{p\}$ or $\emptyset$ . For the purpose of deriving a contradiction, suppose that a label $\Delta(\rho|_{r})\neq\emptyset$ is assigned to a round $r$ -prefix $\rho|_{r}$ in iteration $r$ and there exists some $\sigma|_{r}$ with $\sigma|_{r}\sim_{\geq n-f}\rho|_{r}$ and $\emptyset\neq\Delta(\sigma|_{r})\neq\Delta(\rho|_{r})$ . Let $S$ be the set of involved processes, i.e., $\sigma|_{r}\sim_{s}\rho|_{r}$ for $s\in S$ with $|S|\geq n-f$ .

We need to distinguish all the different ways of assigning labels to $\rho|_{r}$ .

Suppose $\sigma|_{r}$ nor $\rho|_{r}$ get their labels in round $r$ , but not in line 1. Since both $\text{{mature}}(\sigma,r)=\text{{true}}$ and $\text{{mature}}(\rho,r)=\text{{true}}$ , Lemma 9.10.(iii) in conjunction with Lemma 9.11 reveals that $p_{\sigma}=p_{\rho}$ since $\sigma|_{r}\sim_{\geq n-f}\rho|_{r}$ . In all cases except for the one where both $\rho|_{r}$ and $\sigma|_{r}$ get their labels in line 1, we immediately get a contradiction since $\Delta(\rho|_{r})=\Delta(\sigma|_{r})$ in any case. Finally, if $\rho|_{r}$ and $\sigma|_{r}$ get their labels in line 1, there is some $\tau|_{r}\sim_{\geq n-f}\rho|_{r}$ with $\text{{mature}}(\tau,r)=\text{{false}}$ but $\Delta(\tau|_{r_{0}})\neq\emptyset$ , where $r_{0}$ is such that $\text{{oldenough}}(\rho,r_{0})=\text{{true}}$ , and some $\omega_{r}\sim_{\geq n-f}\sigma|_{r}$ with the analogous properties in round $r_{0}^{\prime}$ . Let $Q^{\prime}$ resp. $Q^{\prime\prime}$ be the sets of at least $n-f$ processes involved in $\tau|_{r}\sim_{\geq n-f}\rho|_{r}$ resp. $\omega_{r}\sim_{\geq n-f}\sigma|_{r}$ . Since $\text{{mature}}(\rho,r)=\text{{true}}$ , Lemma 9.15 implies $\rho|_{r_{0}}\sim_{C(\rho|_{r})}\sigma|_{r_{0}}\sim_{C(\rho|_{r})}\tau|_{r_{0}}$ and also $\rho|_{r_{0}^{\prime}}\sim_{C(\rho|_{r})}\sigma|_{r_{0}^{\prime}}\sim_{C(\rho|_{r})}\omega_{r_{0}}$ , which establishes $\omega_{r_{0}}\sim_{C(\rho|_{r})}\tau|_{r_{0}}$ . Since, by the induction hypothesis, $\Delta(\omega_{r_{0}})=\Delta(\tau|_{r_{0}})$ , we again end up with $\Delta(\rho|_{r})=\Delta(\sigma|_{r})$ , which provides the required contradiction.

However, we also need to make sure that inconsistent labels cannot be assigned in line 1 and any of the other lines, possibly in different rounds. For a contradiction, we assume a “generic” setting that can be fit to all cases: We assume that $\sigma|_{r^{\prime}}$ got its label $\Delta(\sigma|_{r^{\prime}})=\Delta(\tau|_{r^{\prime}})\neq\emptyset$ assigned in iteration $r^{\prime}\leq r$ in line 1 or line 1, since there was some already labeled $\tau|_{r^{\prime}}\sim_{\geq n-f}\sigma|_{r^{\prime}}$ with $\text{{mature}}(\tau,r^{\prime})=\text{{true}}$ but $\text{{mature}}(\sigma,r^{\prime})=\text{{false}}$ . Moreover, we assume that $\rho|_{r}$ gets assigned its label $\Delta(\sigma|_{r})\neq\Delta(\rho|_{r})=\Delta(\omega_{r})\neq\emptyset$ in iteration $r\geq r^{\prime}>r_{\text{stab},\tau}+\Theta$ also in line 1 or in line 1, since there is some already labeled $\omega_{r}\sim_{\geq n-f}\rho|_{r}$ with $\text{{mature}}(\omega,r)=\text{{true}}$ but $\text{{mature}}(\rho,r)=\text{{false}}$ . Note carefully that we can rule out the possibility that there are two different, say, $\sigma|_{r^{\prime}}$ and $\sigma^{\prime}|_{r^{\prime}}$ , with inconsistent labels, which both match the condition of line 1 or line 1: This is prohibited by the induction hypothesis, except in the case of $r^{\prime}=r$ , where the above generic scenario applies.

To also cover the cases where $\rho|_{r}$ gets it label assigned in the other lines, we can set $\rho|_{r}=\omega_{r}$ in our considerations below. Note that the induction hypothesis again rules out the possibility that there are two different, say, $\sigma|_{r_{0}}$ and $\sigma^{\prime}|_{r_{0}}$ , with inconsistent labels, which both match the condition of line 1 here, since $r_{0}<r$ .

Let $Q^{\prime}\subseteq C(\tau|_{r^{\prime}})$ be the set of at least $n-f$ processes causing $\tau|_{r^{\prime}}\sim_{\geq n-f}\sigma|_{r^{\prime}}$ , and $Q^{\prime\prime}\subseteq C(\omega_{r})$ be the set of at least $n-f$ non-faulty processes causing $\omega_{r}\sim_{\geq n-f}\rho|_{r}$ . Since $\text{{mature}}(\tau,r^{\prime})=\text{{true}}$ and $\text{{mature}}(\omega,r)=\text{{true}}$ , Lemma 9.15 implies

[TABLE]

We first consider the case $r_{0}^{\prime}\leq r_{0}\leq r^{\prime}$ : Since $Q^{\prime}\subseteq C(\tau|_{r^{\prime}})$ , $\tau|_{r^{\prime}}\sim_{Q^{\prime}}\sigma|_{r^{\prime}}$ also implies $\tau|_{r_{0}}\sim_{Q^{\prime}}\sigma|_{r_{0}}$ . As $\text{{oldenough}}(\tau,r_{0}^{\prime})=\text{{true}}$ , Lemma 9.10.(ii) also ensures $\text{{oldenough}}(\tau,r_{0})=\text{{true}}$ . Moreover, since obviously $Q^{\prime}\cap C(\omega_{r})\neq\emptyset$ as well, we finally observe that actually $\tau|_{r_{0}}\sim_{Q^{\prime}\cap C(\omega_{r})}\omega_{r_{0}}$ . By Lemma 9.11, we hence find that $p_{\omega}=p_{\tau}$ . Now there are two possibilities: If actually $\tau|_{r_{0}}\sim_{\geq n-f}\omega_{r_{0}}$ holds, line 1 implies that $\Delta(\omega_{r})=\Delta(\tau|_{r_{0}})$ . Otherwise, every process will eventually be able to distinguish $\tau|_{r}$ and $\omega_{r}$ and, hence, $\rho|_{r}$ and $\sigma|_{r}$ by Lemma 9.14. Both are contradictions to one of our assumptions $\Delta(\omega_{r})\neq\Delta(\tau|_{r_{0}})$ and $\rho|_{r}\sim_{\geq n-f}\sigma|_{r}$ .

To handle the case $r_{0}^{\prime}>r_{0}$ , we note that we can repeat exactly the same arguments as above if we exchange the roles of $\omega_{r}$ and $\tau|_{r^{\prime}}$ and $\sigma|_{r}$ and $\rho|_{r}$ . In the only possible case of $r_{0}\leq r_{0}^{\prime}\leq r$ , since $Q^{\prime\prime}\subseteq C(\omega_{r})$ , $\omega_{r}\sim_{Q^{\prime\prime}}\rho|_{r}$ also implies $\omega_{r_{0}^{\prime}}\sim_{Q^{\prime\prime}}\rho|_{r_{0}^{\prime}}$ . As $\text{{oldenough}}(\omega,r_{0})=\text{{true}}$ , Lemma 9.10.(ii) also ensures $\text{{oldenough}}(\omega,r_{0}^{\prime})=\text{{true}}$ . Moreover, since obviously $Q^{\prime\prime}\cap C(\tau|_{r^{\prime}})\neq\emptyset$ as well, we finally observe that actually $\omega_{r_{0}^{\prime}}\sim_{Q^{\prime\prime}\cap C(\tau|_{r^{\prime}})}\tau|_{r_{0}^{\prime}}$ . By Lemma 9.11, we hence find again that $p_{\omega}=p_{\tau}$ . The same arguments as used in the previous paragraph establish the required contradictions.

In the remaining case $r_{0}^{\prime}\leq r_{0}$ but $r_{0}>r^{\prime}$ , we have the situation where $\sigma|_{r^{\prime}}$ has already assigned its label before round $r_{0}$ , where $\text{{oldenough}}(\rho,r_{0})=\text{{true}}$ . In general, every process may be able to distinguish $\rho$ and $\sigma$ (not to speak of $\tau$ and $\omega$ ) after $r_{0}$ , and usually $p_{\tau}\neq p_{\omega}$ , so nothing would prevent $\Delta(\sigma|_{r})\neq\Delta(\rho|_{r})$ if the labeling algorithm would not have taken special care, namely, in line 1: Rather than just assigning $\Delta(\rho|_{r})=\{p_{\omega}\}$ , it uses the label of $\sigma|_{r_{0}}$ and therefore trivially avoids inconsistent labels. Note carefully that doing this is well-defined: If there were two different eligible $\sigma|_{r_{0}}$ and $\sigma^{\prime}|_{r_{0}}$ available in line 1, (15) reveals that $\sigma|_{r_{0}}\sim_{\geq n-f}\sigma^{\prime}|_{r_{0}}$ , such that their labels must be the same by the induction hypothesis.

This completes the proof of our theorem. ∎

The following Lemma 9.17 also reveals that a non-empty label $p$ assigned to some prefix $\sigma|_{r}$ is a broadcaster:

Lemma 9.17.

If $\Delta(\sigma|_{r})=\{p\}$ is computed by Algorithm 1, then $(p,0,x_{p}(\sigma))$ is contained in the view $V_{q}(\sigma|_{r})$ of every process $q\in\Pi\setminus{F(\sigma|_{r})}$ that has not crashed in $\sigma|_{r}$ .

Proof.

We distinguish the two essential cases where $\rho|_{r}\in\Sigma_{p}$ can get its label $\{p\}$ : If $\Delta(\rho|_{r})$ was assigned via line 1, the dominant $p_{\rho}$ must indeed have reached all correct processes in the system according to Definition 9.9 of $\text{{oldenough}}(\rho,r_{0})$ , which is incorporated in $\text{{mature}}(\rho,r)$ . In all other cases, $\Delta(\rho|_{r})$ was assigned since there is some $\sigma|_{r^{\prime}}\sim_{s^{\prime}}\rho|_{r^{\prime}}$ , $r^{\prime}\leq r$ , with at least $\text{{oldenough}}(\sigma,r^{\prime})=\text{{true}}$ . By the same argument as before, the dominant $p_{\sigma}$ must have reached every correct process in $\sigma|_{r^{\prime}}$ already. As $\text{{minheardof}}_{s^{\prime}}(\sigma,r^{\prime})\geq r_{\text{stab},\sigma}+\Theta$ according to the definition of $\text{{oldenough}}(\sigma,r^{\prime})$ implies also $\text{{minheardof}}_{s^{\prime}}(\rho,r^{\prime})\geq r_{\text{stab},\sigma}+\Theta$ since $\sigma|_{r^{\prime}}\sim_{s^{\prime}}\rho|_{r^{\prime}}$ , it follows that $p_{\sigma}$ has also reached all correct processes in $\rho|_{r^{\prime}}$ already. ∎

10. Conclusions

We provided a complete characterization of both uniform and non-uniform deterministic consensus solvability in distributed systems with benign process and communication failures using point-set topology. Consensus can only be solved when the space of admissible executions/process-time graphs can be partitioned into disjoint decision sets that are both closed and open in our topologies. We also showed that this requires exclusion of certain (fair and unfair) limit sequences, which limit broadcastability and happen to coincide with the forever bivalent executions constructed in bivalence and bipotence proofs. The utility and wide applicability of our characterization was demonstrated by applying it to several different distributed computing models.

Part of our future work will be devoted to a generalization of our topological framework to other decision problems. Another very interesting area of future research is to study the homology of non-compact message adversaries, i.e., the topological structure of the space of admissible executions using combinatorial topology.

Bibliography43

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Afek and Gafni (2013) Yehuda Afek and Eli Gafni. 2013. Asynchrony from Synchrony. In Distributed Computing and Networking . Lecture Notes in Computer Science, Vol. 7730. Springer Berlin Heidelberg, 225–239. https://doi.org/10.1007/978-3-642-35668-1_16 · doi ↗
3Aguilera et al . (2004) Marcos Kawazoe Aguilera, Carole Delporte-Gallet, Hugues Fauconnier, and Sam Toueg. 2004. Communication-efficient leader election and consensus with limited link synchrony. In Proceedings of the 23th ACM Symposium on Principles of Distributed Computing (PODC’04) . ACM Press, St. John’s, Newfoundland, Canada, 328–337. https://doi.org/10.1145/1011767.1011816 · doi ↗
4Alpern and Schneider (1985) Bowen Alpern and Fred B. Schneider. 1985. Defining Liveness. Inform. Process. Lett. 21, 4 (1985), 181–185.
5Attiya et al . (2020) Hagit Attiya, Armando Castañeda, and Sergio Rajsbaum. 2020. Locally Solvable Tasks and the Limitations of Valency Arguments. In 24th International Conference on Principles of Distributed Systems, OPODIS 2020, December 14-16, 2020, Strasbourg, France (Virtual Conference) (LIP Ics) , Quentin Bramas, Rotem Oshman, and Paolo Romano (Eds.), Vol. 184. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 18:1–18:16. https://doi.org/10.4230/LIP Ics.OPODIS.2020.18 · doi ↗
6Ben-Zvi and Moses (2014) Ido Ben-Zvi and Yoram Moses. 2014. Beyond Lamport’s Happened-before: On Time Bounds and the Ordering of Events in Distributed Systems. J. ACM 61, 2, Article 13 (April 2014), 26 pages. https://doi.org/10.1145/2542181 · doi ↗
7Biely and Robinson (2019) Martin Biely and Peter Robinson. 2019. On the Hardness of the Strongly Dependent Decision Problem. In Proceedings of the 20th International Conference on Distributed Computing and Networking (ICDCN ’19) . ACM, New York, NY, USA, 120–123. https://doi.org/10.1145/3288599.3288614 · doi ↗
8Biely et al . (2018) Martin Biely, Peter Robinson, Ulrich Schmid, Manfred Schwarz, and Kyrill Winkler. 2018. Gracefully degrading consensus and k-set agreement in directed dynamic networks. Theoretical Computer Science 726 (2018), 41–77. https://doi.org/10.1016/j.tcs.2018.02.019 · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Topological Characterization of Consensus in Distributed Systems

Abstract.

1. Introduction

2. Related Work

3. Generic System Model

Definition 3.1 (Non-uniform and uniform consensus).

4. Topological Structure of Full-Information Executions

Lemma 4.1.

Proof.

Lemma 4.2.

Proof.

4.1. Uniform topology for executions

Theorem 4.3 Properties of ppp-view pseudo-metric.

Lemma 4.4.

Proof.

4.2. Non-uniform topology for executions

Lemma 4.5.

Proof.

5. General Consensus Characterization for Full-Information Executions

Theorem 5.1 Characterization of uniform consensus.

Proof.

Theorem 5.2 Characterization of non-uniform consensus.

Proof.

6. Process-Time Graphs

6.1. Basic operational system model

6.2. Implementing global time satisfying the weak clock property

6.3. Defining process-time graphs

Definition 6.1 (Process-time graph prefixes).

Lemma 6.2 Continuity of τ^\hat{\tau}τ^.

Definition 6.3 (vvv-valent process-time graph).

Theorem 6.4 Characterization of uniform consensus.

Theorem 6.5 Characterization of non-uniform consensus.

7. Consensus Characterization in Terms of Broadcastability

Definition 7.1 (Diameter of a set).

Definition 7.2 (Broadcastability).

Theorem 7.3 Diameter of broadcastable connected sets.

Corollary 7.4 Diameter of broadcastable PSzvPS_{z_{v}}PSzv​​.

Theorem 7.5 Consensus characterization via broadcastability.

8. Limit-based Consensus Characterization

Definition 8.1 (Distance of sets).

Theorem 8.2 General set distance condition.

Corollary 8.3.

Proof.

Definition 8.4 (Fair and unfair process-time graphs).

Lemma 8.5 Separation lemma (Munkres, Lemma 23.12).

Proof.

Corollary 8.6 Separation-based characterization.

Corollary 8.7 Fair/unfair consensus impossibility.

Lemma 8.8 Compactness of excluded sequences.

Proof.

9. Applications

9.1. Bivalence-based impossibilities

9.2. Consensus in synchronous systems with general omission process faults

9.3. Dynamic networks with limit-closed message adversaries

Corollary 9.1 Decision sets for compact MAs are compact.

Proof.

Definition 9.2 (ε\varepsilonε-approximations).

Lemma 9.3 Properties of ε\varepsilonε-approximation.

Proof.

Lemma 9.4 Separation of ε\varepsilonε-approximations for compact MAs.

Proof.

Corollary 9.5 Matching ε\varepsilonε-approximation.

Theorem 9.6 Consensus characterization for compact MAs.

Proof.

Theorem 9.7 (WSM19:OPODIS, Thm. 1).

9.4. Dynamic networks with non-limit closed message adversaries

Assumption 1 (WSN21:FCT, Assumpt. 1).

9.5. Consensus in systems with an eventually timely fff-source

Definition 9.8 (WTL elementary state predicates and variables).

Definition 9.9 (WTL extended state predicates and variables).

Lemma 9.10 Properties of oldenough and mature.

Proof.

Lemma 9.11.

Theorem 4.3 Properties of $p$ -view pseudo-metric.

Lemma 6.2 Continuity of $\hat{\tau}$ .

Definition 6.3 ( $v$ -valent process-time graph).

Corollary 7.4 Diameter of broadcastable $PS_{z_{v}}$ .

Definition 9.2 ( $\varepsilon$ -approximations).

Lemma 9.3 Properties of $\varepsilon$ -approximation.

Lemma 9.4 Separation of $\varepsilon$ -approximations for compact MAs.

Corollary 9.5 Matching $\varepsilon$ -approximation.

9.5. Consensus in systems with an eventually timely $f$ -source