Topological Characterization of Consensus in Distributed Systems
Thomas Nowak, Ulrich Schmid, Kyrill Winkler

TL;DR
This paper characterizes the solvability of consensus in distributed systems with faults using novel topological methods, extending classical topology to analyze execution spaces and explain algorithmic possibilities.
Contribution
It introduces fault-aware topologies on execution spaces, providing a unified topological framework for understanding consensus solvability and existing impossibility results.
Findings
Consensus solvability corresponds to disconnected sets in the new topologies.
The approach explains existing algorithms and impossibility results topologically.
Develops a new equivalence between strong and weak validity conditions.
Abstract
We provide a complete characterization of both uniform and non-uniform deterministic consensus solvability in distributed systems with benign process and communication faults using point-set topology. More specifically, we non-trivially extend the approach introduced by Alpern and Schneider in 1985, by introducing novel fault-aware topologies on the space of infinite executions: the process-view topology, induced by a distance function that relies on the local view of a given process in an execution, and the minimum topology, which is induced by a distance function that focuses on the local view of the process that is the last to distinguish two executions. Consensus is solvable in a given model if and only if the sets of admissible executions leading to different decision values is disconnected in these topologies. By applying our approach to a wide range of different applications, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Distributed Control Multi-Agent Systems · Cooperative Communication and Network Coding
Topological Characterization of Consensus in Distributed Systems
Dedicated to the 2018 Dijkstra Prize winners Bowen Alpern and Fred B. Schneider
Thomas Nowak
Université Paris-Saclay, CNRSOrsayFrance
,
Ulrich Schmid
TU WienViennaAustria
and
Kyrill Winkler
ITK EngineeringViennaAustria
Abstract.
We provide a complete characterization of both uniform and non-uniform deterministic consensus solvability in distributed systems with benign process and communication faults using point-set topology. More specifically, we non-trivially extend the approach introduced by Alpern and Schneider in 1985, by introducing novel fault-aware pseudo-(semi-)metric topologies on the space of infinite executions: the process-view topology, induced by a distance function that relies on the local view of a given process in an execution, and the minimum topology, which is induced by a distance function that focuses on the local view of the process that is the last to distinguish two executions. Consensus is solvable in a given model if and only if the sets of admissible executions leading to different decision values is disconnected in these topologies. We also provide two alternative characterizations, based on the broadcastability of connected components and on the exclusion of certain “fair” and “unfair” limit sequences (which coincide with forever bivalent runs). By applying our approach to a wide range of different applications, we provide a topological explanation of a number of existing algorithms and impossibility results and develop several new ones.
Topological characterization; point-set topology; consensus; distributed systems; benign faults
††copyright: acmlicensed††doi: 0000001.0000001††ccs: Theory of computation Distributed algorithms
1. Introduction
We provide a complete characterization111This paper is an substantial generalization and extension of our PODC’19 paper (NSW19:PODC), and also covers its offsprings (WSM19:OPODIS) and (WSN21:FCT) w.r.t. applications. of the solvability of deterministic non-uniform and uniform consensus in any distributed system with benign process and/or communication failures, using point-set topology as introduced in the Dijkstra Prize-winning paper by Alpern and Schneider (AS84). Our results hence precisely delimit the consensus solvability/impossibility border in very different distributed systems such as dynamic networks (KO11:SIGACT) controlled by a message adversary (AG13), synchronous distributed systems with processes that may crash or commit send and/or receive omission failures (PT86), or purely asynchronous systems with crash failures (FLP85), for example. Whereas we will primarily focus on message-passing architectures in our examples, our topological approach also covers shared-memory systems.
Deterministic consensus, where every process starts with some input value and has to irrevocably compute a common output value is arguably the most well-studied problem in distributed computing. Both impossibility results and consensus algorithm are known for virtually all distributed computing that have been proposed so far. However, they have been obtained primarily on a case-by-case basis, using classic combinatorial analysis techniques (FR03). Whereas there are also some reasonably model-independent characterizations (MR02; LM95:DC), we are not aware of any approach that allows to precisely characterize the consensus solvability/impossibility border for arbitrary distributed systems with benign process and communication failures in general.
In this paper, we provide such a characterization based on point-set topology as introduced by Alpern and Schneider (AS84). Regarding topological methods in distributed computing, one has to distinguish point-set topology, which considers the space of infinite executions of a distributed algorithm, from combinatorial topology, which studies the topology of reachable states of prefixes of admissible executions using simplicial complexes. Fig. 1 illustrates the objects studied in combinatorial topology vs. point-set topology. As of today, combinatorial topology has been developed into a quite widely applicable tool for the analysis of distributed systems (HKR13). A celebrated result in this area is the Asynchronous Computability Theorem (HS99:ACT; GRS22:ITCS), for example, which characterizes solvable tasks in wait-free asynchronous shared memory systems with crashes.
By contrast, point-set topology has only rarely been used in distributed computing. The primary objects are the infinite executions of a distributed algorithm (AS84). By defining a suitable metric between two infinite executions and , each considered as the corresponding infinite sequence of global states of the algorithm in the respective execution, they can be viewed as elements of a topological space. For example, according to the common-prefix metric , the executions and are close if the common prefix where no process can distinguish them is long. A celebrated general result (AS84) is that closed and dense sets in the resulting space precisely characterize safety and liveness properties, respectively.
Prior to our paper (NSW19:PODC), however, point-set topology has only occasionally been used for establishing impossibility results. We are only aware of some early work by one of the authors of this paper on a generic topological impossibility proof for consensus in compact models (Now10:master), and a topological study of the strongly dependent decision problem (BR19:ICDCN). Lubitch and Moran (LM95:DC) introduced a construction for schedulers, which leads to limit-closed submodels222Informally, a model is limit-closed if the limit of a sequence of growing prefixes of admissible executions is admissible. Note that the wait-free asynchronous model is limit-closed. of classic non-closed distributed computing models (like asynchronous systems consisting of processes, up to which may crash). In a similar spirit, Kuznetsov, Rieutard and He showed (KRH18:PODC), in the setting of combinatorial topology, how to reason about non-closed models by considering equivalent affine tasks that are closed. A similar purpose is served by defining layerings, as introduced by Moses and Rajsbaum (MR02). Whereas such constructions of closed submodels greatly simplify impossibility proofs, they do not lead to a precise characterization of consensus solvability in non-closed models, however.
Contributions. Building on our PODC’19 paper (NSW19:PODC) devoted to consensus in dynamic networks under message adversaries (AG13), the present paper provides a complete topological characterization of both the non-uniform and uniform deterministic consensus solvability/impossibility border for general distributed systems with benign333Actually, our framework immediately generalizes to Byzantine faults as well. Since consensus with Byzantine faults needs a different validity condition, though, we restrict our attention to benign faults for consistency of the presentation. process and/or communication faults. To achieve this, we had to add several new topological ideas to the setting of Alpern and Schneider (AS84), as detailed below, which not only allowed us to deal with both closed and non-closed models, but also provided us with a topological explanation of bivalence (FLP85) and bipotence (MR02) impossibility proofs. In more detail:
(i) We introduce a simple generic system model for full-information protocols that covers all distributed system models with benign faults we are aware of. We define new topologies on the execution space of general distributed algorithms in this model, which allow us to reason about sequences of local views of (correct) processes, rather than about global configuration sequences. The -view topology is defined by a pseudo-metric based on the common prefix of ’s local views in the executions and . The uniform and non-uniform minimum topology are induced by the last (correct) process to notice a difference between two executions, which leads to pseudo-semi-metrics.
(ii) We show that consensus can be modeled as a continuous decision function in our topologies, which maps an admissible execution to its unique decision value. This allows us to prove that consensus is solvable if and only if all the decision sets, i.e., the pre-images resp. PS_{v}=\tau^{-1}\bigl{[}\Delta^{-1}[\{v\}]\bigr{]} for every decision value , are disconnected from each other. We also provide a universal uniform and non-uniform consensus algorithm, which rely on this separation.
(iii) We introduce process-time graphs (BM14:JACM) as a succinct alternative to configuration sequences in executions, and show that they are equivalent w.r.t. our topological reasoning. This is accomplished by implementing our generic system model in an “operational” system model, based on the widely applicable system model by Moses and Rajsbaum (MR02).
(iv) We provide an alternative characterization of uniform and non-uniform consensus solvability based on the broadcastability of the connected components of the decision sets in the process-time graph topologies. Moreover, utilizing some properties of the pseudo-metric , we provide a characterization of consensus solvability based on the limits of two infinite sequences of admissible process-time graphs, taken from different decision sets. Consensus is impossible if there is just one pair of such limits with distance 0, which actually coincide with the forever bivalent/bipotent executions constructed in previous proofs (FLP85; MR02).
(v) We demonstrate the utility of our approach by applying our topological findings to several different distributed computing models. Apart from providing a topological explanation of well-known classic results like bivalence proofs, we give the first comprehensive characterization of consensus solvability for both compact and non-compact message adversaries. Moreover, our results also lead to new consensus algorithms for some models.
Paper organization. In Section 3, we define the elements of the spaces that will be endowed with our new topologies in Section 4. Section 5 introduces the consensus problem in topological terms and provides our abstract characterization result for uniform consensus (Theorem 5.1) and non-uniform consensus (Theorem 5.2), which also provide universal algorithms. Process-time graphs and our operational system model are introduced in Section 6. Alternative characterizations based on broadcastability and limit exclusion are provided in Section 7 and Section 8, respectively. Section 9 is devoted to applications. Some conclusions in Section 10 round off our paper.
2. Related Work
Besides the few point-set topology papers (AS84; Now10:master; BR19:ICDCN) and the closed model constructions (LM95:DC; MR02; KRH18:PODC) already mentioned in Section 1, there is an abundant literature on consensus algorithms and impossibility proofs.
Regarding combinatorial topology, it is worth mentioning that our study of the indistinguishability relation of prefixes of runs is closely connected to connectivity properties of the -round protocol complex. However, in non-limit-closed models, we need to go beyond a uniformly bounded prefix length. This is in sharp contrast to the models considered in combinatorial topology (CFPR19:SSS; ACR20:OPODIS), which are all limit-closed (typically, wait-free asynchronous).
A celebrated paper on the impossibility of consensus in asynchronous systems with crash failures is by Fischer, Lynch, and Paterson (FLP85), who also introduced the bivalence proof technique. Unreliable failure detectors for circumventing this impossibility exist (CT96). Consensus in synchronous systems with Byzantine-faulty processes has been introduced by Lamport, Shostak, and Pease (LSP82). The seminal works by Dolev, Dwork, and Stockmeyer (DDS87) and Dwork, Lynch, and Stochmeyer (DLS88) on partially synchronous systems introduced important abstractions like eventual stabilization and eventually bounded message delays, and provided a characterization of consensus solvability under various combinations of synchrony and failure models. Consensus in systems with weak timely links and crash failures was considered (ADGFT04; HMSZ08:TDSC). Algorithms for consensus in systems with general omission process failures were provided by Perry and Toueg (PT86).
Perhaps one of the earliest characterizations of consensus solvability in synchronous distributed systems prone to communication errors is the seminal work by Santoro and Widmayer (SW89), where it was shown that consensus is impossible if up to messages may be lost in each round. This classic result was refined in (SWK09; CBS09) and, more recently, by Coulouma, Godard, and Peters (CGP15), where a property of an equivalence relation on the sets of communication graphs was found that captures exactly the source of consensus impossibility. The authors also showed how this property can be exploited in order to develop a generic consensus algorithm.
Following Afek and Gafni (AG13), such distributed systems are nowadays known as dynamic networks, where the per-round directed communication graphs are controlled by a message adversary. Whereas Coulouma, Godard, Peters (CGP15) studied oblivious message adversaries, where the communication graphs are picked arbitrarily from a set of candidate graphs, more recent papers (BRSSW18:TCS; WSS19:DC) studied eventually stabilizing message adversaries, which guarantee that some rounds with “good” communication graphs will eventually be generated. Note that oblivious message adversaries are limit-closed, which is not the case for general message adversaries like the eventually stabilizing ones. Raynal and Stainer explored the relation between message adversaries and failure detectors (RS13:PODC).
The first characterization of consensus solvability under general message adversaries was provided by Fevat and Godard (FG11), albeit only for systems that consist of two processes. A bivalence argument was used there to show that certain communication patterns, namely, a fair or a special pair of unfair communication patterns, must be excluded by the MA for consensus to become solvable. However, a complete characterization of consensus solvability for arbitrary system sizes did not exist until now.
3. Generic System Model
We consider distributed message passing or shared memory systems made up of a set of deterministic processes with unique identifiers, taken from for simplicity.
We denote individual processes by letters , , etc.
For our characterization of consensus solvability, we restrict our attention to full-information executions, in which processes continuously relay all the information they gathered to all other processes, and eventually apply some local decision function. The exchanged information includes the process’s initial value, but also, more importantly, a record of all events (message receptions, shared memory readings, object invocations, …) witnessed by the process. As such, our general system model is hence applicable whenever no constraints are placed on the size of the local memory and the size of values to be communicated (e.g., message/shared-register size). In particular, it is applicable to classical synchronous and asynchronous message-passing and shared-memory models, with benign (and even Byzantine) process and communication faults. In Section 6, we will provide a more “operational” system model, which is obtained by instantiating our generic system model in the model of Moses and Rajsbaum (MR02).
Formally, a (full-information)
execution is a sequence of (full-information) configurations. For every process , there is an equivalence relation on the set of configurations—the -indistinguishability relation—indicating whether process can locally distinguish two configurations, i.e., if it has the same view in and . In this case we write . Note that two configurations that are indistinguishable for all processes need not be equal. In fact, configurations usually include some state of the communication media that is not accessible to any process.
In addition to the indistinguishability relations, we assume the existence of a function that specifies the set of obedient processes in a given configuration. Obedient processes must follow the algorithm and satisfy the (consensus) specification; usually, is the set of non-faulty processes. Again, this information is usually not accessible to the processes. We make the restriction that disobedient processes cannot recover and become obedient again, i.e., that if is reachable from . We extend the obedience function to the set of admissible executions in a given model by setting , where . Here, denotes a notion of global time that is not accessible to the processes. Consequently, a process is obedient in an
execution if it is obedient in all of its configurations. We further make the restriction that there is at least one obedient process in every execution, i.e., that for all .
We also assume that every process has the possibility to weakly count the steps it has taken. Formally, we assume the existence of weak clock functions such that for every execution and every configuration , the relation implies . Additionally, we assume that as for every execution and every obedient process . This definition allows for non-lockstep, even asynchronous, executions.
For the discussion of decision problems, we need to introduce the notion of input values. Since we limit ourselves to the consensus problem, we need not distinguish between the sets of input values and output values. We thus just assume the existence of a set of potential input values, and require that the potential output values are also in .
Furthermore, each (initial) configuration is assumed to contain an initial value in for each process. This information is locally accessible to the processes, i.e., each process can access its own initial value (and those it has heard from).
A decision algorithm is a collection of functions such that if and if is reachable from and , where represents the fact that has not decided yet. That is, decisions depend on local information only and are irrevocable. Every process thus has at most one decision value in an execution. We can extend the decision function to executions by setting , where . We say that has decided value in configuration or execution if or , respectively.
We will consider both non-uniform and uniform consensus with weak444We note that our results can be easily adapted to different validity conditions. validity, defined as follows:
Definition 3.1 (Non-uniform and uniform consensus).
A non-uniform consensus algorithm is a decision algorithm that ensures the following properties in all of its admissible executions:
- (T)
Eventually, every obedient process must irrevocably decide. (Termination) 2. (A)
If two obedient processes have decided, then their decision values are equal. (Agreement) 3. (V)
If the initial values of processes are all equal to , then is the only possible decision value. (Validity)
A uniform consensus algorithm must ensure (T), (V), and
- (UA)
If two processes have decided, then their decision values are equal. (Uniform Agreement)
By Termination, Agreement, and the fact that every execution has at least one obedient process, for every consensus algorithm, we can define the consensus decision function by setting where is any process that is obedient in execution , i.e., .
To illustrate the difference between uniform and non-uniform consensus, as well as to motivate the two topologies serving to characterize their solvability, consider the example of two synchronous non-communicating processes. The set of processes is and the set of possible values is . Processes proceed in lock-step synchronous rounds, but cannot communicate. Thus, the only information a process has access to is its own initial value and the current time. The set of executions and the obedience function are defined such that one of the processes eventually becomes disobedient in every execution, but not both processes. In this model, it is trivial to solve non-uniform consensus by immediately deciding on one’s own initial value, but uniform consensus is impossible.
4. Topological Structure of Full-Information Executions
In this section, we will endow the various sets introduced in Section 3
with suitable topologies. We first recall briefly the basic topological notions that are needed for our exposition. For a more thorough introduction, however, the reader is advised to refer to a textbook (Munkres).
A topology on a set is a family of subsets of such that , , and contains all arbitrary unions as well as all finite intersections of its members. We call endowed with , often written as , a topological space and the members of open sets. The complement of an open set is called closed and sets that are both open and closed, such as and itself, are called clopen. A topological space is disconnected, if it contains a nontrivial clopen set, which means that it it can be partitioned into two disjoint open sets. It is connected if it is not disconnected.
A function from space to space is continuous if the pre-image of every open set in is open in . Given a space , is called a subspace of if is equipped with the subspace topology . Given , the closure of is the intersection of all closed sets containing . For a space , if , we call a limit point of if it belongs to the closure of . It can be shown that the closure of is the union of with all limit points of . Space is called compact if every family of open sets that covers contains a finite sub-family that covers .
If is a nonempty set, then we call any function a distance function on . Define by setting if and only if for all there exists some such that .
Many topological spaces are defined by metrics, i.e., symmetric definite distance functions for which the triangle inequality holds. For a distance function to define a (potentially non-metrizable) topology though, no additional assumptions are necessary:
Lemma 4.1.
If is a distance function on , then is a topology on .
Proof.
Firstly, we show that is closed under unions. So let . We will show that . Let . Then, by definition of the set union, there exists some such that . But since , there exists some such that
[TABLE]
which shows that .
Secondly, we show that is closed under finite intersections. Let . We will show that . Let . Then, by definition of the set intersection, for all . Because all are in , there exist such that for all . If we set , then . Since we have whenever , we also have
[TABLE]
for all . But this shows that , which means that .
Since it is easy to check that as well, is indeed a topology. ∎
We will henceforth refer to as the topology induced by .
An execution is a sequence of configurations, i.e., an element of the product space . The product topology on a product space of topological spaces is defined as the coarsest topology such that all projections are continuous. It turns out that the product topology on the space is induced by a distance function whose form is known in a special case that covers our needs:
Lemma 4.2.
Let be a distance function on that only takes the values [math] or . Then the product topology of , where every copy of is endowed with the topology induced by , is induced by the distance function
[TABLE]
where and .
Proof.
We first show that all projections are continuous when endowing with the product topology : Let be open and , i.e., implies . Let and set . Then,
[TABLE]
where the last inclusion follows from the openness of . Since is hence open in , the continuity of follows.
Let now be an arbitrary topology on for which all projections are continuous. We will show that , which reveals that is the coarsest topology with continuous projections, i.e., the product topology of where every copy of is endowed by . This will establish our lemma.
So let and take any . There exists some such that . Choose such that , and set
[TABLE]
Then, is open with respect to as a finite intersection of open sets: After all, every (\pi^{s})^{-1}\big{[}B_{1}(C^{s})\big{]} is open by the continuity of the projection . But since , this shows that contains a -open neighborhood for each of its points, i.e., . ∎
4.1. Uniform topology for executions
In previous work on point-set topology in distributed computing (Now10:master), the set of configurations of some fixed algorithm was endowed with the discrete topology, induced by the discrete metric if and [math] otherwise (for configurations ). Moreover, was endowed with the corresponding product topology, which is induced by the common-prefix metric
[TABLE]
where and , according to Lemma 4.2. Informally, decreases with the length of the common prefix where no process can distinguish and .
By contrast, we define the -view distance function on the set of configurations for every process by
[TABLE]
Extending this distance function from configurations to executions, we define the -view pseudo-metric by
[TABLE]
where and . Note that two executions, where has the same local view in all configurations in and before fails in some round , satisfy .
Figure 2 shows an example of different instances of the (pseudo-)metrics introduced so far.
The following lemma shows that the -view pseudo-metric indeed deserves its name:
Theorem 4.3 Properties of -view pseudo-metric.
thm:Pseudometricproperties The -view pseudo-metric on satisfies
[TABLE]
Despite of the lack of definiteness, most properties of metric spaces, including compactness, hold also in pseudo-metric spaces (Fre14). What is obviously lost is the uniqueness of the limit of a convergent sequence of executions, however: if and , then as well.
The uniform minimum topology (abbreviated uniform topology) on the set of executions is induced by the distance function
[TABLE]
Note that is only a pseudo-semi-metric, i.e., only satisfies symmetry and nonnegativity but not the triangle inequality: There may be sequences with and but for all . Hence, the topology on induced by lacks many of the properties of (pseudo-)metric spaces, but will turn out to be sufficient for the characterization of the possibility/impossibility of uniform consensus (see Theorem 5.1).
The next lemma shows that the decision function of an algorithm that solves uniform consensus is always continuous with respect to both any -view and the uniform topology.
Lemma 4.4.
Let be the consensus decision function of a uniform consensus algorithm. Then, is continuous with respect to both the -view distance function , , and the uniform distance function .
Proof.
We only prove the lemma for the uniform topology, by showing that is locally constant, i.e., for all execution , there is some -neighborhood of such that is constant on . The continuity for the -topologies follows since is coarser than .
Let be a time greater than both the latest decision time of the processes in and the latest time any process becomes disobedient in execution . By the Termination property and the fact that disobedient processes cannot become obedient again, we have . Because is larger than the latest time a process becomes disobedient, we have .
Using the notation and , we choose the following neighborhood of :
[TABLE]
Let . Then for some . Since has decided at time in execution and is obedient until time in execution , process has also decided at time in execution . By Uniform Agreement and Termination, all processes in decide as well. In other words , which concludes the proof. ∎
For an illustration in our non-communicating two-process example, denote by the execution in which process has initial value [math], process has initial value , and process becomes disobedient at time . Similarly, denote by the execution with the same initial values and in which process becomes disobedient at time . Since there is no means of communication between the two processes, by Validity, each obedient process necessarily has to eventually decide on its own initial value, i.e., and . The uniform distance between these executions is equal to . Thus, every -neighborhood of contains execution if . The set of [math]-deciding executions is thus not open in the uniform topology. But this means that the algorithm’s decision function cannot be continuous. Lemma 4.4 hence implies that there is no uniform consensus algorithm in the non-communicating two-process model, which is in accordance with the application example in Section 9.2.
4.2. Non-uniform topology for executions
Whereas the -view pseudo-metric given by Eq. 8 is also adequate for non-uniform consensus, this is not the case for the uniform pseudo-semi-metric as defined in Eq. 9. The appropriate non-uniform minimum topology (abbreviated non-uniform topology) on the set of executions is induced by the distance function
[TABLE]
The non-uniform topology is finer than the uniform topology, since the minimum is taken over the smaller set , which means that . In particular, this implies that every decision function that is continuous with respect to the uniform topology is also continuous with respect to the non-uniform topology. Of course, this also follows from Lemma 4.4 and the fact that every uniform consensus algorithm also solves non-uniform consensus.
The following Lemma 4.5 is the analog of Lemma 4.4:
Lemma 4.5.
Let be the consensus decision function of a non-uniform consensus algorithm. Then is continuous with respect to both the -view distance function , , and the non-uniform distance function .
Proof.
We again prove the lemma only for , by showing that is locally constant, i.e., for all execution , there is some -neighborhood of such that is constant on ; the proof for is similar.
Let be the latest decision time of the processes in in execution . By the Termination property, we have . Using the notation and , we choose the following neighborhood of :
[TABLE]
If , then for some . Denote by the decision time of process in . Since , we also have But this means that process decides value at time in both executions and , hence . ∎
For an illustration in the non-communicating two-process example used in Section 4.1, note that the trivial algorithm that immediately decides on its initial value satisfies and . The algorithm does solve non-uniform consensus, since it is guaranteed that one of the processes eventually becomes disobedient. In contrast to the uniform distance function, the non-uniform distance function satisfies since . This means that the minimum distance between a [math]-deciding and a -deciding execution is at least . It is hence possible to separate the two sets of executions by sets that are open in the non-uniform topology, so consensus is solvable here, according to the considerations in the following section, see also Section 9.2.
5. General Consensus Characterization for Full-Information Executions
In this section, we will provide our topological conditions for uniform and non-uniform consensus solvability.
Call an execution -valent if all initial values in the execution are equal to .
Theorem 5.1 Characterization of uniform consensus.
Uniform consensus is solvable if and only if there exists a partition of the set of admissible executions into sets , , such that the following holds:
- (1)
Every is an open set in with respect to the uniform topology induced by . 2. (2)
If execution is -valent, then .
Proof.
(): Define where is the decision function of a uniform consensus algorithm. This is a partition of by Termination, and Validity implies property (2). It thus only remains to show openness of the , which follows from the continuity of , since every singleton is open in the discrete topology.
(): We define a uniform consensus algorithm by defining the decision functions as
[TABLE]
where we use the notation . The function is well defined since the sets are pairwise disjoint.
We first show Termination of the resulting algorithm. Let , let such that , and let . Since is open with respect to the uniform topology, there exists some such that . By definition of , we have and hence .
Writing , let be the smallest integer such that for all . Such a exists since as . Then, for every , we have . In particular, for all , i.e., process decides value in execution .
We next show Uniform Agreement. For the sake of a contradiction, assume that process decides value in configuration in execution . But then, by definition of the function , we have . But this is impossible since .
Validity immediately follows from property (2). ∎
Theorem 5.2 Characterization of non-uniform consensus.
Non-uniform consensus is solvable if and only if there exists a partition of the set of admissible executions into sets , , such that the following holds:
- (1)
Every is an open set in with respect to the non-uniform topology induced by . 2. (2)
If execution is -valent, then .
Proof.
The proof is similar to that of Theorem 5.1, except that the definition of is
[TABLE]
i.e., we just have to add the constraint that to the executions considered in the proof. ∎
If has only finitely many connected components, these characterizations give rise to the following meta-procedure for determining whether consensus is solvable and constructing an algorithm if it is. It requires knowledge of the connected components of the space of admissible executions with respect to the appropriate topology:
- (1)
Initially, start with an empty set for every value . 2. (2)
Add to the connected components of that contain an execution with a -alent initial configuration. 3. (3)
Add all remaining connected components of to an arbitrarily chosen set . 4. (4)
If the sets are pairwise disjoint, then consensus is solvable. In this case, the sets determine a consensus algorithm via the universal algorithm given in the proofs of Theorem 5.1 and Theorem 5.2. If the are not pairwise disjoint, then consensus is not solvable.
6. Process-Time Graphs
Up to now, we have formalized our topological results in terms of admissible executions of the generic system model introduced in Section 3. In this section, we will show that they also hold for topological spaces consisting of other objects, namely, process-time graphs. In a nutshell, a process-time graph describes the process scheduling and all communication occurring in a run, along with the set of initial values. Compared to executions, process-time graphs have a much more succinct description, and will cause the resulting space to be compact. This, in turn, will allow us to prove additional topological results like LABEL:thm:setdistance.
Nevertheless, since we consider deterministic algorithms only, a process-time graph corresponds to a unique execution (and vice versa). This equivalence, which actually results from a transition function that is continuous in all our topologies (see LABEL:lem:tau:is:cont), will eventually allow us to use our topological reasoning in either space alike.
In order to define process-time graphs as generic as possible, we will resort to an intermediate operational system model that is essentially equivalent to the very flexible general system model from Moses and Rajsbaum (MR02). Crucially, it will also instantiate the weak clock functions stipulated in our generic model in Section 3, which must satisfy in every admissible execution . Since represents some global notion of time here (called global real time in the sequel), ensuring this property is sometimes not trivial. More concretely, whereas is inherently known at every process in the case of lock-step synchronous systems like dynamic networks under message adversaries (WSS19:DC), for example, this is not the case for purely asynchronous systems (FLP85).
6.1. Basic operational system model
Following Moses and Rajsbaum (MR02), we consider message passing or shared memory distributed systems made up of a set of processes. We stipulate a global discrete clock with values taken from , which represents global real time in multiples of some arbitrary unit time. Depending on the particular distributed computing model, this global clock may or may not be accessible to the processes.
Processes are modeled as communicating state machines that encode a deterministic distributed algorithm (protocol) . At every real time time , process is in some local state , where is a special state representing that process has failed.555This failed state is the only essential difference to the model of Moses and Rajsbaum (MR02), where faults are implicitly caused by a deviation from the protocol. This assumption makes sense for constructing “permutation layers”, for example, where it is not the environment that crashes a process at will, but rather the layer construction, which implies that some process takes only finitely many steps. Such a process just remains in the local state reached after its last computing step. In our setting, however, the fault state of all processes is solely controlled by the omniscient environment. Hence, we can safely use a failed state to gain simplicity without losing expressive power.
Local state transitions of are caused by local actions taken from the set , which may be internal bookkeeping operations and/or the initiation of shared memory operations resp. of sending messages; their exact semantics may vary from model to model. Note that a single action may consist of finitely many non-zero time operations, which are initiated simultaneously but may complete at different times. The deterministic protocol , representing ’s part in , is a function that specifies the local action is ready to perform when in state . We do not restrict the actions can perform when in state .
In addition, there is an additional non-deterministic state machine called the environment , which represents the adversary that is responsible for actions outside the sphere of control of the processes’ protocols. It
controls things like the completion of shared memory operations initiated earlier resp. the delivery of previously sent messages, the occurrence of process and communication failures, and (optionally) the occurrence of external environment events that can be used for modeling oracle inputs like failure detectors (CT96). Let be the set of all possible combinations of such environment actions (also called events for conciseness later on). We assume that the environment keeps track of pending shared-memory operations resp. sent messages in its environment state . The environment is also in charge of process scheduling, i.e., determines when a process performs a state transition, which will be referred to as taking a step. Formally, we assume that the set of all possible environment actions consists of all pairs , made up of the set of processes that take a step and some (which may both be empty as well). The non-deterministic environment protocol is an arbitrary relation that, given the current global state (defined below, which also contains the current environment state , chooses the next environment action and the successor environment state . Note carefully that we assume that only is actually chosen non-deterministically by , whereas is determined by a transition function according to .
Finally, a global state of our system (simply called state) is an element of . Given a global state , denotes the local state of process in , and denotes the state of the environment in . Recall that it is that keeps track of in-transit (i.e., just sent)
messages, pending shared-memory operations etc.666A different, but equivalent, conceptual model would be to assume that the state of a processor consist of a visible state and, in the case of message passing, message buffers that hold in-transit messages. We also write , where the vector of the local states of all the processes is called configuration. Given , the component denotes the local state of process in , and the set of all possible configurations is denoted as . Note carefully that there may be global configurations where the corresponding configurations satisfy , e.g., in the case of different in-transit messages.
A joint action is a pair , where , and is a vector with index set Sched such that for . When the joint action is applied to global state where process is in local state , then is the action prescribed by ’s protocol. Note that some environment actions, like message receptions at process require , i.e., “wake-up” the process. For example, a joint action that causes to send a message to and process to receive a message sent to it by process earlier, typically works as follows: (i) is caused to take a step, where its protocol initiates the sending of ; (ii) the environment adds to the send buffer of the communication channel from to (maintained in the environment state ); (iii) the environment moves from the send buffer of the communication channel from to (maintained in the environment state ) to the receive buffer (maintained in the local state of ), and (iv) causes to take a step. It follows that the local state of process reflects the content of message immediately after the step scheduled along with the message reception.
With ACT denoting the set of all possible joint actions, the transition function describes the evolution of the global state after application of the joint action , which results in the successor state . A run of is an infinite sequence of global states generated by an infinite sequence of joint actions. In order to guarantee a stable global state at integer times, we conceptually assume that the joint actions occur at times , i.e., that . is the initial state of the run, taken from the set of possible initial states . Finally, denotes the subset of all admissible runs of our system. is typically used for enforcing liveness conditions like “every message sent to a correct process is eventually delivered” or “every correct process takes infinitely many steps”.
Unlike Moses and Rajsbaum (MR02), we handle process failures explicitly in the state of the processes, i.e., via the transition function: If some joint action contains , where requests some process to fail, this will force in the successor state , irrespective of any other operations in (like the delivery of a message) that would otherwise affect . All process failures are persistent, that is, we require that all subsequent environment actions for also request to fail. As a convention, we consider every where fails as taking a step as well. Depending on the type of process failure, failing may cause to stop its protocol-compliant internal computations, to drop all incoming messages, and/or to stop sending further messages. In the case of crash failures, for example, the process may send a subset of the outgoing messages demanded by in the very first failing step and does not perform any protocol-compliant actions in future steps. A send omission-faulty process does the same, except that it may send protocol-compliant messages to some processes also in future steps. A receive omission-faulty process may omit to process some of its received messages in every step where it fails, but sends protocol-compliant messages to every receiver. A general omission-faulty process combines the possible behaviors of send and receive omissions. Note that message loss can also be modeled in a different way in our setting: Rather than attributing an omission failure to the sender or receiver process, it can also be considered a communication failures caused by the environment. The involved sender process resp. receiver process continue to act according to its protocol in this case, i.e., would not enter the fault state resp. here.
Since we only consider deterministic protocols, a run is uniquely determined by the initial configuration and the sequence of tuples consisting of tuples of environment state and environment actions for . Let resp. be the set of all infinite runs resp. executions (configuration sequences), with resp. denoting the set of admissible runs resp. admissible executions (resulting from admissible environment action sequences ).
Our assumptions on the environment protocol, namely, , actually imply that a run , and thus also the corresponding execution , is already uniquely determined by the initial state and the sequence of chosen environment actions . Since is fixed and the environment actions abstract away almost all of the internal workings of the protocols and their complex internal states, it should be possible to uniquely describe the evolution of a run/execution just by means of the sequence . In the following, we will show that this is indeed the case.
6.2. Implementing global time satisfying the weak clock property
Our topological framework crucially relies on the ability to distinguish/not distinguish two local states and in two executions and at global real time . Clearly, this is easy for an omniscent observer who knows the corresponding global states and can thus verify that and arise from the same global time . Processes cannot do that in asynchronous systems, however, since is not available to the processes and hence cannot be included in and . Consequently, two different sequences of environment actions (called events in the sequel for conciseness) and , applied to the same initial state, may produce the same state . This happens when they are causal shuffles of each other, i.e., reorderings of the steps of the processes that are in accordance with the happens-before relation (Lam78). Hence, the (in)distinguishability of configurations does not necessarily match the (in)distinguishability of the corresponding event sequences.
Whereas our generic system model does not actually require processes to have a common notion of time, it does require that the weak clock functions do not progress faster than global real time. We will accomplish this in our operational system model by defining some alternative notion of global time that is accessible to the processes. Doing this will also rule out the problem spotted above, i.e., ensure that runs (event sequences) uniquely determine executions (configuration sequences).
There are many conceivable ways for defining global time, including the following possibilities:
(i) In the case of lock-step synchronous distributed systems, like dynamic networks under message adversaries (NSW19:PODC; WSM19:OPODIS; WSN21:FCT), nothing needs to be done here since all processes inherently know global real time .
(ii) In the case of asynchronous systems with a majority of correct processes, the arguably most popular approach for message-passing systems (see e.g. (MR01; ADGFT04; HMSZ08:TDSC)) is the simulation of asynchronous communication-closed rounds: Processes organize rounds by locally waiting until messages sent in the current round have been received. These messages are then processed, which defines both the local state at the beginning of the next round and the message sent to everybody in this next round. Late messages are discarded, and early messages are buffered locally (in the state of the environment) until the appropriate round is reached. Just using the round numbers as global time, i.e., choosing , is all that is needed for defining globla time in such a model.
(iii) In models without communication-closed rounds (FLP85; RS10:TCS), a suitable notion of global time can be derived from other777We note that both synchronous and asynchronous communication-closed rounds, as well as the executions defined in our generic system model in Section 3, are of course also sequences of consistent cuts. definitions of consistent cuts (Mat89). We will show how this can be done in our operational system model based on Mattern’s vector clocks. Our construction will exploit the fact that a local state transition of a process happens only when it takes a step in our model: In between the th and th step of any fixed process , which happens at time and , respectively, only environment actions (external environment events, message deliveries, shared memory completions), if any, can happen, which change the state of the environment but not the local state of .
We will start out from the sequence of arbitrary cuts (Mat89) (indexed by an integer index ) occurring in a given run (which itself is indexed by the global real time ), where the frontier of is formed by the local states of the processes after they have taken their th step, i.e., and for , with being the time when process takes its th step. Note that the latter is applied to ’s state in the frontier of and processes all the external environment events and all the messages received/shared memory operations completed since then. Recall the convention that every environment action where process fails is also considered as taking a step.
Clearly, except in lock-step synchronous systems, , so can be viewed as the result of applying a trivial “synchronic layering” in terms of Moses and Rajsbaum (MR02). Unfortunately, though, any may be an inconsistent cut, as messages sent by a fast process in its th step may have been received by a slow process by its th step. would violate causality in this case, i.e., would not be left-closed w.r.t. Lamport’s happens-before relation (Lam78).
Recall that we restricted our attention to consensus algorithms using full-information protocols, where every message sent contains the entire state transition history of the sender. As a consequence, we do not significantly lose applicability of our results by further restricting the protocol and the supported distributed computing models as follows:
- (i)
In a single state transition of , process , can
- •
actually receive all messages delivered to it since its last step,
- •
initiate the sending of at most one message to every process, resp.,
- •
initiate at most one single-writer multiple-reader shared memory operation in the shared memory owned by some other process (but no restriction on operations in its own shared memory portion). 2. (ii)
In addition to (optional) external environment events, the environment protocol only provides
- •
, which tells process to fail,
- •
, which identifies the message to be delivered to process (for reception in its next step) by the pair , where is the sending process and is the time when the sending of has been initiated, resp.,
- •
, which identifies the completed shared memory operation (to be processed in its next step), in the shared memory owned by , as the one initiated by process in its step at time ; in a read-type operation, it will return to the shared memory content based on ’s state at time , with .
In such a system, given any cut , it is possible to determine the unique largest consistent cut (Mat89). By construction, , and the frontier of , , consists of the local states of all processes reached by having taken some th step, , with at least one process having taken its th step, i.e., and thus , and with for all processes . Note carefully that happens when, in , process receives some message/data initiated at some step at or before its own th step but after its th step.
Whereas the environment protocol could of course determine all the consistent cuts based on the corresponding sequence of global configurations, this is typically not the case for the processes (unless in the special case of a synchronous system). However, in distributed systems adhering to the above constraints, processes can obtain this knowledge (that is to say, their local share of a consistent cut) via vector clocks (Mat89). More specifically, it is possible to implement a vector clock at process , where counts the number of steps taken by itself so far, and , , gives the number of steps that knows that has taken so far. Vector clocks are maintained as follows: Initially, , and every message sent resp. every shared memory operation data written by gets as piggybacked information (after advancing ). At every local state transition in ’s protocol , is advanced by 1. Moreover, when a previously received message/previously read data value (containing the originating process ’s vector clock value ) is to be processed in the step, is adjusted to the maximum of its previous value and component-wise, i.e., for . Obviously, all this can be implemented transparently atop of any protocol running in the system.
Now, given the sequence of global states of the processes running the so augmented protocol in some run , there is a well-known algorithm for computing the maximal consistent cut for the non-consistent cut formed by the frontier of the local states of the processes after every process has taken its th step: Starting from , process searches for the sought by checking the vector clock value of the state after its own th step. It stops searching and sets iff is less or equal to component-wise. The state is then process ’s contribution in the frontier of the consistent cut . Clearly, from , the sought sequence of the consistent cuts can be obtained trivially by discarding all vector clock information. Therefore, even the processes can compute their share, i.e., their local state, in for every .
By construction, the sequence of consistent cuts , and hence the sequence of its frontiers , completely describe the evolution of the local states of the processes in a run . In our operational model, we will hence just use the indices of as global time for specifying executions: Starting from the initial state , we consider as the result of applying round to (as we did in the case of lock-step rounds).
6.3. Defining process-time graphs
No matter how consistent cuts, i.e., global time, is implemented, from now on, we just overload the notation used so far and denote by the frontier in the consistent cut at global time . So given an infinite execution , we again denote by the th configuration (= the consistent cut with index ) in .
Clearly, by construction, every is uniquely determined by and all the events that cause the steps leading to . In particular, we can define a vector of events , where is a set containing all the events that must be applied to in order to arrive at . Note carefully that a process that does not make a step, i.e., is not scheduled in and thus has the same non- state in and , does not have any event (resp. ) by construction, i.e., . Otherwise, contains a “make a step” event, all (optional) external environment events, and for all messages that have been sent to in steps within and are delivered to after its previous step but before or at its th step (resp. for all completed shared memory operation initiated by in steps within and completed after ’s previous step but before or at its th step). Note that cannot contain any , as no messages have been sent before (resp. no , as no shared memory operations have been initiated before).
As a consequence of our construction, the mismatch problem spotted at the beginning of Section 6.2 no longer exists, and we can reason about executions and the corresponding event sequences alike.
Rather than considering in conjunction with , however, we will consider the corresponding process-time graph -prefix (BM14:JACM) instead, which we will now define. Since we are only interested in consensus algorithms here, we assume that every process has a dedicated initial state for every possible initial value , taken from a finite input domain . For every assignment of initial values to the processes in the initial configuration , we inductively construct the following sequence of process-time graph prefixes :
Definition 6.1 (Process-time graph prefixes).
For every , the process-time graph -prefix of a given run is defined as follows:
- •
The process-time graph [math]-prefix contains the nodes for all processes , with input value , and no edges.
- •
The process-time graph -prefix contains the nodes and for all processes , where if (which models the case of an initially dead process (FLP85)), and otherwise, where is some encoding (e.g., some failure detector output) of the external environment events . It contains a (local) edge from to and no other edges.
- •
The process-time graph -prefix , , contains and the nodes for all processes , where if , and otherwise. It contains a (local) edge from to (if the latter node is present at all, i.e., when ), where is maximal among all nodes in . For message passing systems, it also contains an edge from , , to iff . For shared memory systems, it contains an edge from , , to iff ; this reflects the fact that the returned data originate from ’s step and not from the step where has initiated the shared memory operation.
The round- process-time graph , for , which represents the contribution of round to , is defined as (i) and the set of vertices along with all their incoming edges (which all originate in ).
Figure 3 shows an example of a process-time graph prefix occuring in a run with lock-step synchronous or asynchronous rounds. The nodes are horizontally aligned according to global time, progressing along the vertical axis.
Figure 4 shows an example of a process-time graph prefix occuring in a run with processes that do not execute in lock-step rounds and may crash. Nodes are again horizontally aligned according to global time, progressing along the vertical axis. The frontier of the th consistent cut, reached at the end of round , is made up of C_{p}^{k}=\{(p,\ell_{p}(k),*)\in PTG^{k}\mid\mbox{0\leq\ell_{p}(k)\leq k is maximal}\}. That is, starting from the (possibly inconsistent) cut made up of the nodes of all processes, one has to go down for process until the first node is reached where no edge originating in a node with time has been received.
Let be the set of all possible process-time graph -prefixes, and be the set of all posible infinite process-time graphs, for all possible runs of our system. Note carefully that , as well every set of round- process-time graphs for finite , is necessarily finite (provided the encoding () for external environment events has a finite domain, which we assume). Clearly, resp. can be expressed as a finite resp. infinite sequence resp. of round- process time graphs.888Note that we slightly abuse the notation here, which normally represents .
We will denote by the set of all admissible process-time graphs in the given model, and by the corresponding set of admissible executions. Note carefully that process-time graphs are independent of the (decision function of the) consensus algorithm, albeit they do depend on the input values.
Due to the equivalence of process-time graphs and executions, our whole topological machinery developed in Section 4–Section 5 for can also be applied to . Since, in sharp contrast to the set of configurations , the set of process-time graphs is finite for any time , Tychonoff’s theorem999Tychonoff’s theorem states that any product of compact spaces is compact (with respect to the product topology). implies compactness of the -view topology on .
Whereas this is not necessarily the case for , we can prove compactness of the image of under an appropriately defined operational transition function: Given the original transition function , it is possible to define a PTG transition function that just provides the (unique) execution for a given process-time graph. The following LABEL:lem:tau:is:cont shows that is continuous in any of our topologies.
Lemma 6.2 Continuity of .
lem:tau:is:cont For every , the PTG transition function is continuous when both and are endowed with any of , , , .
Since the image of a compact space under a continuous function is compact, it hence follows that the set of admissible executions is a compact subspace of . The common structure of and its image under the PTG transition function , implied by the continuity of , hence allows us to reason in either of these spaces. In particular, with Definition 6.3, the analog of Theorem 5.1 and Theorem 5.2 read as follows:
Definition 6.3 (-valent process-time graph).
We call a process-time graph , for , -valent, if it starts from an initial configuration where all processes have the same input value .
Theorem 6.4 Characterization of uniform consensus.
Uniform consensus is solvable if and only if there exists a partition of the set of admissible process-time graphs into sets , , such that the following holds:
- (1)
Every is an open set in with respect to the uniform topology induced by . 2. (2)
If process-time graph is -valent, then .
Theorem 6.5 Characterization of non-uniform consensus.
Non-uniform consensus is solvable if and only if there exists a partition of the set of admissible process-time graphs into sets , , such that the following holds:
- (1)
Every is an open set in with respect to the non-uniform topology induced by . 2. (2)
If process-time graph is -valent, then .
7. Consensus Characterization in Terms of Broadcastability
We will now develop another characterization of consensus solvability, with rests on the broadcastability of the connected component that contains the -valent process-time graph .
Definition 7.1 (Diameter of a set).
For , depending on the distance function that induces the appropriate topology,
define ’s diameter as d(A)=\sup\{d(a,b)\mid\mbox{a,b\in A}\}.
Definition 7.2 (Broadcastability).
We call a subset of admissible process-time graphs broadcastable by the broadcaster , if for every there is some round , by which every process that is still obedient in round knows input value in , denoted , i.e., has in its view .
We will now prove the essential fact that connected broadcastable sets have a diameter strictly smaller than :
Theorem 7.3 Diameter of broadcastable connected sets.
thm:broadcastablediameter If a connected set of admissible process-time graphs is broadcastable by some process , then , as well as , i.e., ’s input value satisfies for all .
Corollary 7.4 follows immediately from LABEL:thm:broadcastablediameter:
Corollary 7.4 Diameter of broadcastable .
If for a -valent is broadcastable for , then , as well as , since ’s input value is the same for all .
We can now prove the following necessary and sufficient condition for solving consensus based on broadcastability:
Theorem 7.5 Consensus characterization via broadcastability.
thm:charbroadcastability A model allows to solve uniform resp. non-uniform consensus if and only if it guarantees that the connected components of the set of admissible processes-time graphs in the uniform topology resp. the non-uniform topology are broadcastable for some process.
8. Limit-based Consensus Characterization
It is possible to shed some additional light on the consensus characterization by exploiting the fact that every , , is a pseudo-metric (unlike and ): Since most of the convenient properties of metric spaces, including sequential compactness, also hold in pseudo-metric spaces, we can further explore the border of the decision sets . It will turn out in Corollary 8.6 that consensus is impossible if and only if certain limit points in the appropriate topologies are admissible. Note that we will prove our results only for and uniform consensus; literally the same proofs apply for non-uniform consensus as well.
For a given consensus algorithm, we consider the set of all admissible process-time graphs resp. the corresponding set of admissible executions endowed with the subspace topology generated by resp. with the subspace topology101010Whenever we state a topological property w.r.t. the subspace topology, we will refer to (resp. ), otherwise to (resp. ). in the -view topology. Recall that and are not closed in general, hence not compact, even though and are compact, recall LABEL:lem:tau:is:cont.
Definition 8.1 (Distance of sets).
For with distance function , let d(A,B)=\inf\{d(a,b)\mid\mbox{a\in Ab\in B}\}.
We prove the following result, which also holds when , are not closed/compact. Corollary 8.3 shows that it also holds in the uniform and non-uniform topology.
Theorem 8.2 General set distance condition.
thm:setdistance Let and be arbitrary subsets of . Then, if and only if there are infinite sequences and of process-time graphs, as well as with and (with respect to the appropriate -view topology) with .
Eq. 9 and Eq. 11 allow us to extend this result from -view-topologies to the uniform and non-uniform topologies:
Corollary 8.3.
Let be arbitrary subsets of . Then resp. if and only if there are infinite sequences and of process-time graphs as well as with and (with respect to the appropriate minimum topology) and resp. .
Proof.
The proof of Theorem LABEL:thm:setdistance can be carried over literally, by using the fact that every convergent infinite sequence w.r.t. has a convergent infinite subsequence w.r.t. some (obedient) by the pigeonhole principle since is finite. ∎
The above LABEL:thm:setdistance allows us to distinguish 3 main cases that cause : (i) If , one can choose the sequences defined by , . (ii) If and , there is a “fair” process-time graph as the common limit. (iii) If and , there is a pair of “unfair” process-time graphs acting as limits, which have distance 0 (and are hence also common limits w.r.t. the pseudo-metric ). We note, however, that due to the non-uniqueness of the limits in our pseudo-metric, (iii) are actually two instances of (ii). We kept the distinction for compatibility with the existing results (FG11).
Definition 8.4 (Fair and unfair process-time graphs).
Consider two process-time graphs of some consensus algorithm with decision sets , , in any appropriate topology:
- •
is called fair, if for some there are convergent sequences and with and .
- •
, are called a pair of unfair process-time graphs, if for some there are convergent sequences with and with and and have distance 0.
An illustration is provided by Figure 6. Note carefully that, in the uniform case, a fair/unfair process-time graph where some process becomes disobedient in round implies that the same happens in all and . On the other hand, if does not fail in , it may still be the case that fails in every in the sequence converging to , at some time with . In the non-uniform case, neither of these possibilities exists: cannot fail in the limit , and any where fails is also excluded as its distance to any other sequence is 1.
The above findings go nicely with the alternative characterization of consensus solvability given in Corollary 8.6, which results from applying the following Lemma 8.5 from the textbook of Minkres (Munkres) to Theorem 5.1.
Lemma 8.5 Separation lemma (Munkres, Lemma 23.12).
If is a subspace of , a separation of is a pair of disjoint nonempty sets and whose union is , neither of which contains a limit point of the other. The space is connected if and only if there exists no separation of .
Proof.
The closure of a set in is , where denotes the closure in . To show that is not connected implies a separation, assume that are closed and open in , so . Consequently, . Since is the union of and its limit points, none of the latter is in . An analogous argument shows that none of the limit points of can be in .
Conversely, if for disjoint non-empty sets , which do not contain limit points of each other, then and . From the equivalence above, we get and , so both and are closed in and, as each others complement, also open in as well. ∎
Corollary 8.6 Separation-based characterization.
Uniform resp. non-uniform consensus is solvable in a model generating the set of admissible process-time graph sequences if and only if there exists a partition of into sets such that the following holds:
- (1)
No contains a limit point of any other w.r.t. the uniform resp. non-uniform topology in . 2. (2)
Every -valent admissible sequence satisfies .
We hence immediately obtain:
Corollary 8.7 Fair/unfair consensus impossibility.
The set of admissible process-time graphs of a consensus algorithm with decision sets , , does not contain any fair process-time graph sequence or any pair of unfair process-time graph sequences.
Whereas we did not manage to characterize the set of limit sequences that had to be excluded in order to ensure consensus solvability, we can prove that, for any decision set , it must be a compact set:
Lemma 8.8 Compactness of excluded sequences.
Let , , be any decision set of a correct consensus algorithm, be its closure in and its interior. Then, , which is the set of to be excluded limit points, is compact.
Proof.
The closure is closed by definition. Since the complement of the interior is also closed by definition, it follows that is also closed. As a closed subset of the compact set , is hence compact. ∎
9. Applications
In this section, we will apply our topological characterizations of consensus solvability to several different examples. Apart from providing a topological explanation of bivalence proofs (Section 9.1) and folklore results for synchronous consensus under general omission faults (Section 9.2), we will provide a complete characterization of consensus solvability for dynamic networks with both closed (Section 9.3) and non-closed (Section 9.4) message adversaries. Finally, we will provide a consensus algorithm for asynchronous systems with weak timely links (Section 9.5), which does not rely on an implementation of the failure detector.
9.1. Bivalence-based impossibilities
Our topological results shed some new light on the now standard technique of bivalence-based impossibility proofs introduced in the celebrated FLP paper (FLP85), which have been generalized (MR02) and used in many different contexts: Our results reveal that the forever bivalent executions constructed inductively in bivalence proofs (SW89; SWK09; BRSSW18:TCS; WSS19:DC) are just the common limit of two infinite sequence of executions all contained in, say, the decision set and all contained in that have a common limit and in some -view topology with .
More specifically, what is common to these proofs is that one shows that, for any consensus algorithm, there is an admissible forever bivalent run. This is usually done inductively, by showing that there is a bivalent initial configuration and that, given a bivalent configuration at the end of round , there is a 1-round extension leading to a bivalent configuration at the end of round . By definition, bivalence of means that there are two admissible executions with decision value 0 and with decision value 1 starting out from , i.e., having a common prefix that leads to . Consequently, their distance, in any -view topology, satisfies . Note that this is also true for the more general concept of a bipotent configuration , as introduced by Moses and Rajsbaum (MR02).
By construction, the -prefix of and are the same, for all , which implies that they converge to a limit (and analogously for ), see Figure 6 for an illustration. Therefore, these executions match Definition 8.4, and Corollary 8.7 implies that the stipulated consensus algorithm cannot be correct. Concrete examples are the lossy link impossibility (SW89), i.e., the impossibility of consensus under an oblivious message adversary for that may choose any graph out of the set , and the impossibility of solving consensus with vertex-stable source components with insufficient stability interval (BRSSW18:TCS; WSS19:DC). In the case of the oblivious lossy link message adversary using the reduced set considered by Coulouma, Godard, and Peters (CGP15), consensus is solvable and there is no forever bivalent run. Indeed, there exists a consensus algorithm where all configurations reached after the first round are already univalent.
9.2. Consensus in synchronous systems with general omission process faults
As a more elaborate example of systems where the solvability of non-uniform and uniform consensus may be different, we take synchronous systems with up to general omission process failures (PT86). For , non-uniform consensus can be solved in rounds, whereas solving uniform consensus requires . Note that these systems also cover the examples in Section 4.
The impossibility proof for uses a standard partitioning argument, splitting into a set of processes with and with . One considers an admissible process-time graph where all processes start with , the ones in are correct, and the ones in are initially dead; the decision value of the processes in must be 0 by validity. Similarly, starts from , all processes in are correct and the ones in are initially dead; the decision value is hence 1. For another process-time graph , where the processes in are correct and the ones in are general omission faulty, in the sense that every does not send and receive any message to/from , one observes for every , as well as for every . Hence, and decide on different values in .
Topologically, this is equivalent to as well as , which implies as well as . Consequently, and cannot be disjoint, as needed for uniform consensus solvability. Clearly, for , this argument is no longer applicable. And indeed, algorithms like the one proposed by Parvedy and Raynal (PR03:IPDPS) can be used for solving uniform consensus.
If one revisits the topological equivalent of the above partitioning argument for in the non-uniform case, it turns out that still , but as all processes in are faulty. Consequently, . So and could partition the space of admissible executions. And indeed, non-uniform consensus can be solved in rounds here. In order to demonstrate this by means of our Theorem 5.2, we will sketch how the required decision sets can be constructed. We will do so by means of a simple labeling algorithm, which assigns a decision value to every admissible process-time graph . Recall that synchronous systems are particularly easy to model in our setting, since we can use the number of rounds as our global time .
Clearly, every process that omits to send its state in some round to a (still) correct processor is revealed to every other (still) correct processor at the next round at the latest. This implies that every correct process seen by some correct process by the end of the -round prefix in the admissible process-time graph has also been seen by every other correct process during as well, since one would need a chain of different faulty processes for propagating ’s state to otherwise. Thus, must have managed to broadcast to all correct processes during .
Consequently, if , then they must have the same set of broadcasters. Our labeling algorithm hence just assigns to the initial value of the, say, lexically smallest broadcaster in . The resulting decision sets are trivially open since, for every , we have as well. The generic non-uniform consensus algorithm from Theorem 5.2 can hence be used to solve consensus.
9.3. Dynamic networks with limit-closed message adversaries
In this section, we consider dynamic networks under message adversaries (like oblivious ones (SW89; CGP15)) that are limit-closed (WSM19:OPODIS), in the sense that every convergent sequence of process-time graphs with for every has a limit . Note that processes do not fail in such systems, i.e., are all obedient, such that we will only consider the -view and the uniform topology here. An illustration is shown in Fig. 5, where the blue dots represent the ’s and the limit point at the boundary.
In this case, the set of admissible process-time graph sequences is closed and hence a compact subspace both in any -view topology and in the minimum topology. Moreover, we obtain:
Corollary 9.1 Decision sets for compact MAs are compact.
For every correct consensus algorithm for a compact message adversary and every , is closed in and compact, and for any and , and hence also .
Moreover, there are only finitely many different connected components , , which are all compact, and for every with , it holds that and hence also .
Proof.
Since all decision sets are closed in by Theorem 5.1 and is compact for a compact message adversary, it follows that every is also compact. From Theorem LABEL:thm:setdistance it hence follows that . As this holds for every , we also have .
Since every connected component of that contains is closed in , as the closure of a connected subspace is also connected (Munkres, Lemma 23.4) and a connected component is maximal, the same arguments as above also apply to . To show that there are only finitely many different for -valent sequences , observe that is an open covering of . Since the latter is compact, there is a finite sub-covering , and all other for a -valent must be equal to one of those, as connected components are either disjoint or identical. ∎
We now make the abstract characterization of Theorem 5.1 and our meta-procedure more operational, by introducing the -approximation of the connected component that contains a process-time graph , typically for some , . It is constructed iteratively, using finitely many iterations (since the number of different possible -prefixes satisfies ) of the following algorithm:
Definition 9.2 (-approximations).
Let be an admissible process-time graph. In the minimum topology, we iteratively define , for , as follows: ; for , ; and where is such that . For , the -approximation is defined as , where every denotes a -valent process-time graph.
Lemma 9.3 Properties of -approximation.
For every , every , every , -valent , and every -valent , the -approximations have the following properties:
- (i)
For a closed message adversary, there are only finitely many different , . 2. (ii)
For every , it holds that . 3. (iii)
* implies .* 4. (iv)
.
Proof.
Properties (ii)–(iv) hold for arbitrary message adversaries: To prove (ii), it suffices to mention . As for (iii), if , the iterative construction of would reach , which would cause it to also include the whole , as the latter also reaches . If (iv) would not hold, could be separated into disjoint open sets, which contradicts connectivity. Finally, (i) holds for closed message adversaries, since Corollary 9.1 implies that there are only finitely many different connected components , which carry over to by (iii) and (iv). ∎
We now show that and for sequences and with have a distance , provided is sufficiently small:
Lemma 9.4 Separation of -approximations for compact MAs.
For a compact message adversary that allows to solve consensus, let and be such that . Then there is some such that, for any , it holds that .
Proof.
According to Corollary 9.1, the components and are compact. Theorem LABEL:thm:setdistance reveals that we have . By Lemma 9.3.(iv), for every , and . Therefore, setting secures . ∎
We immediately get the following corollary, which allows us to reformulate Theorem LABEL:thm:charbroadcastability as given in Theorem 9.6.
Corollary 9.5 Matching -approximation.
For a compact message adversary, if is chosen in accordance with Lemma 9.4, then for every .
Theorem 9.6 Consensus characterization for compact MAs.
A compact message adversary allows to solve consensus if and only if there is some such that every -valent , , is broadcastable for some process.
Proof.
Our theorem follows from Theorem LABEL:thm:charbroadcastability in conjunction with Corollary 9.5. ∎
It follows that if consensus is solvable, then, for every , the universal algorithm from Theorem 5.1 with for some arbitrary value , and for the remaining , can be used. And indeed, the consensus algorithm given by Winkler, Schmid, and Moses (WSM19:OPODIS, Alg. 1) can be viewed as an instantiation of it.
Moreover, Corollary 9.5 implies that checking the broadcastability of can be done by checking the broadcastability of finite prefixes. More specifically, like the decision function of consensus, the function that gives the round by which every process in has of the broadcaster in its view is locally constant for a sufficiently small neighborhood, namely, , and is hence continuous in any of our topologies. Since is compact, is in fact uniformly continuous and hence attains its maximum in . It hence suffices to check broadcastability in the -prefixes of for in Theorem 9.6. This has been translated into the following non-topological formulation (WSM19:OPODIS) (where is the set of processes that reached all processes in the process-time graph and is the set of -prefixes of the process-time graphs in in the uniform topology):
Theorem 9.7 (WSM19:OPODIS, Thm. 1).
Consensus is solvable under a closed message adversary MA if and only if for each there is a round such that .
9.4. Dynamic networks with non-limit closed message adversaries
In this section, we finally consider message adversaries that are not limit-closed (FG11; Pfl18:master; WSS19:DC). Unfortunately, we cannot use the -approximations according to Definition 9.2 here: Even if is made arbitrarily small, Lemma 9.4 does not hold. An illustration is shown in Fig. 6. It is apparent that adding a ball in the iterative construction of , where for some forbidden limit sequence , inevitably lets the construction grow into some where has a different valence than . Whereas this could be avoided by adapting when coming close to , the resulting approximation does not provide any advantage over directly using our characterization theorem Theorem 5.1.
These topological results are of course in accordance with the results on non-limit closed message adversaries we are aware of. For example, the binary consensus algorithm for by Fevat and Godard (FG11) assumes that the algorithm knows a fair or a pair of unfair sequences a priori, which effectively partition the sequence space into two connected components.111111Note that there are uncountably many choices for separating and here, however.
The -VSRC message adversary (WSS19:DC) generates process-time graphs that consist of single-rooted communication graphs in every round, with the additional guarantee that, eventually, a -vertex-stable root component (-VSRC) occurs. Herein, a root component is a strongly connected component without in-edges from outside the component, and an -VSRC is a root component made up of the same set of processes in consecutive rounds. is the dynamic diameter of a VSRC, which ensures that all root members reach all processes.
It has been proved (WSS19:DC) that consensus is impossible with for , whereas an algorithm exists for . Obviously, effectively excludes all sequences without any -VSRC. And indeed, the choice renders the connected components of broadcastable by definition, which is in accordance with LABEL:thm:charbroadcastability.
We also introduced and proved correct an explicit labeling algorithm for , which effectively operationalizes the universal consensus algorithm of Theorem 6.4 (WSN21:FCT): By assigning an (invariant) label to the -prefixes of , it effectively assigns a corresponding unique decision value to , which in turn specifies the decision set containing . It is instructive to see how the requirement of every being open (and closed) in Theorem 6.4 translates into a corresponding assumption on this labeling function:
Assumption 1 (WSN21:FCT, Assumpt. 1).
**
For , it has been proved (WSN21:FCT, Thm. 12) that the given labeling algorithm satisfies this assumption for , where is the round where the (first) -VSRC in starts. Consensus is hence solvable by a suitable instantiation of the universal consensus algorithm of Theorem 6.4.
9.5. Consensus in systems with an eventually timely -source
It is well-known (DDS87) that consensus cannot be solved in distributed systems of (partially) synchronous processes, up to which may crash, which are connected by reliable asynchronous communication links. For solving consensus, the system model has been strengthened by a weak timely link (WTL) assumption (ADGFT04; HMSZ08:TDSC): there has to be at least one correct process that eventually sends timely to a sufficiently large subset of the processes.
In previous work (ADGFT04), at least one eventually timely -source was assumed: After some unknown initial period where all end-to-end message delays are arbitrary, every broadcast of is received by a fixed subset with within some possibly unknown maximum end-to-end delay . The authors showed how to build the failure detector in such a system, which, in conjunction with any -based consensus algorithm like the one by Mostéfaoui and Raynal (MR01), allows to solve uniform consensus.
Their implementation lets every process broadcast a heartbeat message every steps, which forms partially synchronized rounds, and maintains an accusation counter for every process that counts the number of rounds the heartbeats of which were not received timely by more than processes. This is done by letting every process who does not receive ’s broadcast within send an accusation message for , and incrementing the accusation counter for if more than such accusation messages from different receivers came in. It is not difficult to see that the accusation counter of a process that crashes grows unboundedly, whereas the accusation counter of every timely -source eventually stops being incremented. Since the accusation counters of all processes are exchanged and agreed-upon as well, choosing the process with the smallest accusation counter (with ties broken by process ids) is a legitimate choice for the output of .
This WTL model was further relaxed (HMSZ08:TDSC), which allows the set of witnessing receivers of every eventually moving timely -source to depend on the sending round . The price to be paid for this relaxation is the need to incorporate the sender’s round number in the heartbeat and accusation messages.
In this subsection, we will use our Theorem 5.1 to prove topologically that consensus can indeed be solved in the WTL model: We will give and prove correct an explicit labeling algorithm Algorithm 1, which assigns a decision value to every process-time graph that specifies the decision set containing . Applying our universal algorithm to these decision sets hence allows to solve consensus in this model. Obviously, unlike the existing algorithms, our algorithm does not rely on an implementation of .
We assume a (slightly simplified121212It would not be difficult to extend our considerations for partially synchronous processes with unknown , since global synchrony is not needed for our algorithm: A process only needs to be able to timeout the periodic heartbeat messages of process , in a way that eventually ensures a timeout larger than . This pairwise timeout is easy to implement in the case of partially synchronous processes, by incrementing the timeout with every accusation of . We do not incorporate this feature to keep our presentation as simple as possible.) WTL model with synchronous processes and asynchronous links that are reliable and FIFO, with known for timely links. Whereas we will use the time our synchronous processes take their steps as global time, we note that we do not have communication-closed rounds here, i.e., have to deal with general process-time graphs according to Definition 6.1. In an admissible execution , we denote by the set of up to processes that crash in , and the set of correct processes. For an eventual timely -source , we will denote with the stabilization round, by which it has already started to send timely: a message sent in round is received by every no later than in round , hence is present in ’s state at time . Note carefully that this is always satisfied when has crashed by that round. We again assume that the processes execute a full-history protocol, i.e., send their whole state in every round. For keeping the relation to the existing algorithms, we consider the state message sent by in round to be its . Moreover, if the state of process at time does not contain the reception from process , we will say that broadcasts an accusation message for round of in round (which is of course just part of ’s state sent in this round). If crashes before round , it will never broadcast . If crashes exactly in round , we can nevertheless assume that it either manages to send to all correct processes in the system or to none: In our full information protocol, every process that receives will forward this message to all other processes when it broadcasts its own state later on.
Definition 9.8 (WTL elementary state predicates and variables).
For process at time , i.e., the end of round , we define the following predicates and state variables:
- •
if and only if did not receive from by time and thus sent .
- •
if and only if recorded the reception of from by time .
- •
if and only if for at least different .
- •
.
- •
\text{{heardof}}_{s}^{r}(p)=|\{k\leq r:\mbox{s\text{{heartbeat}}(k)pr}\}|.
Note that a process that crashes before time causes for all , and that is appended in for tie-breaking purposes only. For every eventually timely -source , the implicit forwarding of accusation messages ensures that will eventually be the same at every correct process in the limit .
We now define some predicates that require knowledge of the execution . Whereas they cannot be computed locally by the processes in the execution, they can be used in the labeling algorithm.
Definition 9.9 (WTL extended state predicates and variables).
Given an execution , let the dominant eventual timely -source be the one that leads to the unique smallest value of , which is the same at every process . With denoting the stabilization time of the dominant eventual timely -source in and the set of processes that crashed by time , we also define
- •
,
- •
if and only if , both (i) and (ii) .
- •
if and only if such that both (i) and (ii) .
Note that it may occur that another eventual timely -source in has a smaller stabilization time than the dominant one, which happens if causes more accusations than before stabilization in total.
The following properties are almost immediate from the definitions:
Lemma 9.10 Properties of oldenough and mature.
The following properties hold for oldenough:
- (i)
If , then for every that did not crash by time . 2. (ii)
* is stable, i.e., for .* 3. (iii)
(i) and (ii) also hold for , and .
Proof.
Since entails that every process has received the accusation messages for all rounds up to since according to Definition 9.9, (i) follows. This also implies (ii), since the accusation counter of every process can at most increase after time . That these properties carry over to mature is obvious from the definition. ∎
The following lemma proves that two executions and with cannot both satisfy resp. , and, hence, resp. , except when the dominant eventual timely -source is the same in and :
Lemma 9.11.
Consider two executions and with for some process that is not faulty by round in both and . Then,
[TABLE]
Proof.
Since , it follows from Definition 9.9 that . Analogously, implies . Since , this is only possible if . ∎
Finally, we need the following technical lemmas:
Lemma 9.12 Indistinguishability precondition.
Suppose is such that received a message from containing its state in the sending round by round in and hence also in . Analogously, suppose is such that received a message from containing its state in the sending round by round in and hence also in . Then,
- (i)
, 2. (ii)
, 3. (iii)
, 4. (iv)
.
Proof.
If (i) would not hold, since sends a message containing its state in round to both in and in , these two states would be distinguishable for , which contradicts our assumption. The analogous argument proves (iii). Statement (ii) follows from combining (i) with , (iv) follows from combining (iii) with . ∎
Lemma 9.13 Heardof inheritance.
Suppose and for some , as it arises in , for example. Then, , it also holds in that , but not necessarily for . Consequently, it may happen that .
Proof.
Since the state of is the same in and , but the sets and may be different, the lemma follows trivially. ∎
With the abbreviation for all non-faulty processes in , and for , we define the short-hand notation to express indistinguishability for a majority of (correct) processes, defined by such that .
The following lemma guarantees that prefixes that are indistinguishable only for strictly less than processes are eventually distinguishable for all processes:
Lemma 9.14 Vanishing minority indistinguishability.
Given , there is a round , , such that for every with , it holds that .
Proof.
Due to our reliable link assumption, for every process that does not fail in , there is a round where . Now assume that there is some with for a maximal set with , but for some process . Since receives round- messages from processes in , and , process must receive exactly the same messages also in . As at most of those messages may be sent by processes that cannot distinguish , at least one such message must originate in a process with . In this case, Lemma 9.12.(iii) prohibits , however, which provides the required contradiction. ∎
The following lemma finally shows that majority indistinguishability in conjunction with mature prefixes entails strong indistinguishability properties in earlier rounds:
Lemma 9.15 Majority indistinguishability precondition.
Suppose and . Then, for the round imposed by the latter, it holds that , and hence also .
Proof.
Let resp. be the set of at least processes causing resp. . Since by the pigeonhole principle, let . Clearly, , and hence also . Since , Lemma 9.12.(i) in conjunction with Lemma 9.13 implies , as well as , and hence also as asserted. ∎
With these preparations, we can define an explicit labeling algorithm Algorithm 1 for the WTL model, i.e., an algorithm that computes a label for every -prefix of an admissible execution in our WTL model. A label can either be (still undefined) or else denote a single process (which will turn out to be a broadcaster), and will be consistent in in the sense that for every . Note that we can hence uniquely also assign a label to an infinite execution. Note that, for defining our decision sets, we will assign to , where is the initial value of in .
Informally, our labeling algorithm works as follows: If there is some unlabeled mature prefix , it is labeled either (i) with the label of some already labeled but not yet mature if the latter got its label early enough, namely, by the round where , or else (ii) with its dominant .
The following Theorem 9.16 shows that Algorithm 1 computes labels, which result in a partitioning that is compatible with the needs of Theorem 5.1. Consensus in the WTL model can hence be solved by means of our universal algorithm applied to the resulting decision sets.
Theorem 9.16 Decision set partition for WTL algorithm.
The set is open in the uniform topology, and so is the decision set .
Proof.
We show that, if is assigned to the partition set , then , where is the smallest round where and is the maximum number of rounds required for a minority indistinguishability in to go away ( in the notation of Lemma 9.14), which implies openness of . Note that the corresponding property obviously also holds for the decision set .
First of all, in Algorithm 1, gets initialized to in line 1 and assigns a label at the latest when . Once assigned, this value is never modified again as each assignment, except the one in line 1, may only be performed if the label was still .
For an unlabeled prefix that is indistinguishable to a mature labeled prefix , there are two possibilities: Either, its indistinguishability is a majority one, in which case gets its label from in line 1, or else the minority indistinguishability will go away within rounds. It thus suffices to show that if a label is assigned to a round prefix , then every majority-indistinguishable prefix has either or .
We prove this by induction on . The base for follows directly from line 1. For the step from to , assume by hypothesis that, for all round prefixes that already had assigned, all their majority-indistinguishable prefixes have label or . For the purpose of deriving a contradiction, suppose that a label is assigned to a round -prefix in iteration and there exists some with and . Let be the set of involved processes, i.e., for with .
We need to distinguish all the different ways of assigning labels to .
Suppose nor get their labels in round , but not in line 1. Since both and , Lemma 9.10.(iii) in conjunction with Lemma 9.11 reveals that since . In all cases except for the one where both and get their labels in line 1, we immediately get a contradiction since in any case. Finally, if and get their labels in line 1, there is some with but , where is such that , and some with the analogous properties in round . Let resp. be the sets of at least processes involved in resp. . Since , Lemma 9.15 implies and also , which establishes . Since, by the induction hypothesis, , we again end up with , which provides the required contradiction.
However, we also need to make sure that inconsistent labels cannot be assigned in line 1 and any of the other lines, possibly in different rounds. For a contradiction, we assume a “generic” setting that can be fit to all cases: We assume that got its label assigned in iteration in line 1 or line 1, since there was some already labeled with but . Moreover, we assume that gets assigned its label in iteration also in line 1 or in line 1, since there is some already labeled with but . Note carefully that we can rule out the possibility that there are two different, say, and , with inconsistent labels, which both match the condition of line 1 or line 1: This is prohibited by the induction hypothesis, except in the case of , where the above generic scenario applies.
To also cover the cases where gets it label assigned in the other lines, we can set in our considerations below. Note that the induction hypothesis again rules out the possibility that there are two different, say, and , with inconsistent labels, which both match the condition of line 1 here, since .
Let be the set of at least processes causing , and be the set of at least non-faulty processes causing . Since and , Lemma 9.15 implies
[TABLE]
We first consider the case : Since , also implies . As , Lemma 9.10.(ii) also ensures . Moreover, since obviously as well, we finally observe that actually . By Lemma 9.11, we hence find that . Now there are two possibilities: If actually holds, line 1 implies that . Otherwise, every process will eventually be able to distinguish and and, hence, and by Lemma 9.14. Both are contradictions to one of our assumptions and .
To handle the case , we note that we can repeat exactly the same arguments as above if we exchange the roles of and and and . In the only possible case of , since , also implies . As , Lemma 9.10.(ii) also ensures . Moreover, since obviously as well, we finally observe that actually . By Lemma 9.11, we hence find again that . The same arguments as used in the previous paragraph establish the required contradictions.
In the remaining case but , we have the situation where has already assigned its label before round , where . In general, every process may be able to distinguish and (not to speak of and ) after , and usually , so nothing would prevent if the labeling algorithm would not have taken special care, namely, in line 1: Rather than just assigning , it uses the label of and therefore trivially avoids inconsistent labels. Note carefully that doing this is well-defined: If there were two different eligible and available in line 1, (15) reveals that , such that their labels must be the same by the induction hypothesis.
This completes the proof of our theorem. ∎
The following Lemma 9.17 also reveals that a non-empty label assigned to some prefix is a broadcaster:
Lemma 9.17.
If is computed by Algorithm 1, then is contained in the view of every process that has not crashed in .
Proof.
We distinguish the two essential cases where can get its label : If was assigned via line 1, the dominant must indeed have reached all correct processes in the system according to Definition 9.9 of , which is incorporated in . In all other cases, was assigned since there is some , , with at least . By the same argument as before, the dominant must have reached every correct process in already. As according to the definition of implies also since , it follows that has also reached all correct processes in already. ∎
10. Conclusions
We provided a complete characterization of both uniform and non-uniform deterministic consensus solvability in distributed systems with benign process and communication failures using point-set topology. Consensus can only be solved when the space of admissible executions/process-time graphs can be partitioned into disjoint decision sets that are both closed and open in our topologies. We also showed that this requires exclusion of certain (fair and unfair) limit sequences, which limit broadcastability and happen to coincide with the forever bivalent executions constructed in bivalence and bipotence proofs. The utility and wide applicability of our characterization was demonstrated by applying it to several different distributed computing models.
Part of our future work will be devoted to a generalization of our topological framework to other decision problems. Another very interesting area of future research is to study the homology of non-compact message adversaries, i.e., the topological structure of the space of admissible executions using combinatorial topology.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Afek and Gafni (2013) Yehuda Afek and Eli Gafni. 2013. Asynchrony from Synchrony. In Distributed Computing and Networking . Lecture Notes in Computer Science, Vol. 7730. Springer Berlin Heidelberg, 225–239. https://doi.org/10.1007/978-3-642-35668-1_16 · doi ↗
- 3Aguilera et al . (2004) Marcos Kawazoe Aguilera, Carole Delporte-Gallet, Hugues Fauconnier, and Sam Toueg. 2004. Communication-efficient leader election and consensus with limited link synchrony. In Proceedings of the 23th ACM Symposium on Principles of Distributed Computing (PODC’04) . ACM Press, St. John’s, Newfoundland, Canada, 328–337. https://doi.org/10.1145/1011767.1011816 · doi ↗
- 4Alpern and Schneider (1985) Bowen Alpern and Fred B. Schneider. 1985. Defining Liveness. Inform. Process. Lett. 21, 4 (1985), 181–185.
- 5Attiya et al . (2020) Hagit Attiya, Armando Castañeda, and Sergio Rajsbaum. 2020. Locally Solvable Tasks and the Limitations of Valency Arguments. In 24th International Conference on Principles of Distributed Systems, OPODIS 2020, December 14-16, 2020, Strasbourg, France (Virtual Conference) (LIP Ics) , Quentin Bramas, Rotem Oshman, and Paolo Romano (Eds.), Vol. 184. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 18:1–18:16. https://doi.org/10.4230/LIP Ics.OPODIS.2020.18 · doi ↗
- 6Ben-Zvi and Moses (2014) Ido Ben-Zvi and Yoram Moses. 2014. Beyond Lamport’s Happened-before: On Time Bounds and the Ordering of Events in Distributed Systems. J. ACM 61, 2, Article 13 (April 2014), 26 pages. https://doi.org/10.1145/2542181 · doi ↗
- 7Biely and Robinson (2019) Martin Biely and Peter Robinson. 2019. On the Hardness of the Strongly Dependent Decision Problem. In Proceedings of the 20th International Conference on Distributed Computing and Networking (ICDCN ’19) . ACM, New York, NY, USA, 120–123. https://doi.org/10.1145/3288599.3288614 · doi ↗
- 8Biely et al . (2018) Martin Biely, Peter Robinson, Ulrich Schmid, Manfred Schwarz, and Kyrill Winkler. 2018. Gracefully degrading consensus and k-set agreement in directed dynamic networks. Theoretical Computer Science 726 (2018), 41–77. https://doi.org/10.1016/j.tcs.2018.02.019 · doi ↗
