Spartan: Sparse Robust Addressable Networks
John Augustine, Sumathi Sivasubramaniam

TL;DR
Spartan is a distributed P2P overlay network that efficiently maintains a stable, addressable structure capable of handling adversarial churn, enabling reliable communication and computation in highly dynamic environments.
Contribution
We introduce Spartan, a novel overlay network that can be built and maintained efficiently under adversarial churn, with stable committees for robust distributed computation.
Findings
Constructed in O(log n) rounds
Maintains stability despite adversarial churn
Supports sustained computation and messaging
Abstract
A Peer-to-Peer (P2P) network is a dynamic collection of nodes that connect with each other via virtual overlay links built upon an underlying network (usually, the Internet). P2P networks are highly dynamic and can experience very heavy churn, i.e., a large number of nodes join/leave the network continuously. Thus, building and maintaining a stable overlay network is an important problem that has been studied extensively for two decades. In this paper, we present our \Pe overlay network called Sparse Robust Addressable Network (Spartan). Spartan can be quickly and efficiently built in a fully distributed fashion within rounds. Furthermore, the Spartan overlay structure can be maintained, again, in a fully distributed manner despite adversarially controlled churn (i.e., nodes joining and leaving) and significant variation in the number of nodes. Moreover, new nodes can join…
| Number of committees | Threshold | Number of failed repetitions | ||
|---|---|---|---|---|
| 160 | 2880 | 0 | 10 | 28 |
| 384 | 7680 | 0 | 10 | 27 |
| 896 | 17920 | 0 | 11 | 30 |
| 2048 | 40960 | 3 | 21 | 30 |
| 4608 | 100000 | 3 | 18 | 30 |
| 10240 | 250000 | 0 | 9 | 30 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Spartan: Sparse Robust Addressable Networks††thanks: tThis research was supported in part by an Extra-Mural Research Grant (file number EMR/2016/003016) funded by the Science and Engineering Research Board, Department of Science and Technology (SERB), Government of India. The first author is also supported by an SERB MATRICS project (file number MTR/2018/001198) and an Indo-German joint project supported by DST and DAAD (INT/FRG/DAAD/P-25/2018). ††thanks: Preliminary results of this paper appeared in the Proceedings of the 32nd IEEE International Parallel &
Distributed Processing Symposium (IPDPS), 2018 [AS18]
John Augustine
Computer Science and Engineering
Indian Institute of Technology Madras
Chennai, India
&Sumathi Sivasubramaniam
Computer Science and Engineering
Indian Institute of Technology
Chennai, India
Abstract
A Peer-to-Peer (P2P) network is a dynamic collection of nodes that connect with each other via virtual overlay links built upon an underlying network (usually, the Internet). P2P networks are highly dynamic and can experience very heavy churn, i.e., a large number of nodes join/leave the network continuously. Thus, building and maintaining a stable overlay network is an important problem that has been studied extensively for two decades.
In this paper, we present our P2P overlay network called Sparse Robust Addressable Network (Spartan). Spartan can be quickly and efficiently built in a fully distributed fashion within rounds. Furthermore, the Spartan overlay structure can be maintained, again, in a fully distributed manner despite adversarially controlled churn (i.e., nodes joining and leaving) and significant variation in the number of nodes. Moreover, new nodes can join a committee within rounds and leaving nodes can leave without any notice.
The number of nodes in the network lies in for any fixed . Up to nodes (for some small but fixed ) can be adversarially added/deleted within any period of rounds for some . Despite such uncertainty in the network, Spartan maintains committees that are stable and addressable collections of nodes each for rounds with high probability.
Spartan’s committees are also capable of performing sustained computation and passing messages between each other. Thus, any protocol designed for static networks can be simulated on Spartan with minimal overhead. This makes Spartan an ideal platform for developing applications. We experimentally show that Spartan will remain robust as long as each committee, on average, contains 24 nodes for networks of size up to .
K****eywords Overlay networks Peer-to-Peer networks dynamic networks churn.
1 Introduction
Peer-to-Peer (P2P) networks have become increasingly popular over the years, with applications such as Bitcoin, Bittorrent, Cloudmark, etc. becoming widespread in use. Such distributed and decentralized methods of networking facilitate resource sharing and decentralized applications that are resilient and scalable. In addition to the typical P2P context where end-users clients are connected to form the overlay, P2P principles have also influenced dedicated overlay infrastructure provided by commercial enterprises like Akamai [1].
P2P networks are typically virtual networks — called overlay networks — built on top of an underlying (often physical) network — called the underlay. A simple schematic is shown in Figure 1. The underlay network is typically the Internet and only a relatively small (and dynamically changing) fraction of the nodes in the underlay network may choose to participate in the overlay P2P network. The underlay allows us to create virtual edges in the network (e.g., TCP connections) that can serve as the basis for virtual overlay edges. In our work, our goal is to design a semi-structured overlay that provides stable addressable locations also called a distributed hash table (DHT) that facilitates storage and retrieval of data items.
The challenge lies in the fact that P2P networks tend to be highly dynamic and experience heavy churn, i.e., a large number of peer nodes join and leave the network at a rapid pace [2, 3, 4]. In fact, real world studies of P2P networks [5] show that as much as fifty percent of a network can be replaced within a hour. This makes searching and indexing of data items notoriously difficult because unpredictable churn can destroy stored data items.
The ability to store, index, search, and retrieve data items addressed by keys is crucial in many P2P applications like BitTorrent, Bitcoin, and a host of other DHT implementations. Quite naturally, there has been a significant amount of work in building overlay structures that allow for efficient look-up of data items in P2P networks. Many overlay structures like Chord [4], CAN [6], Pastry [7], Tapestry [8], and Skip Net [9] have been proposed. However one problem that is persistent in all of these works is the lack of mathematical guarantee against heavy churn, i.e., up to nodes joining and leaving in every round. We refer to [10, 11] as good surveys of available research in this topic.
Recent works that do provide guarantees in the face of heavy churn, such as [12] make assumptions about the network topology such as the network having good expansion properties. Research has shown that expander topologies may provide the right framework for solving several different problems in P2P networks such as consensus [13], leader election [14], and even storage and search [12] for a single data item. The good news is that we now have ways to build and maintain overlays with good expansion properties despite heavy churn [15]. However, such expander topologies tend to be unstructured and prove to be inefficient for storing, searching, and retrieving data items, which is a key requirement for DHT applications.
1.1 Model Characteristics
We present a robust overlay network called Sparse Robust Addressable Network (Spartan) that allows for the storage and look up of data items in P2P networks. The number of nodes may vary over time, but for ease of exposition, we assume that the number of nodes is in the range for some constant . As the name implies, it is a sparse network with maximum degree bounded by . Our model brings together a wide range of modeling ideas found in several prior works [16, 17, 18, 12, 15, 19] that are quite disparate with respect to their modeling approaches. Our hope is that our model will be a step towards a unified approach to modeling this complex problem. We will highlight a few salient model characteristics here; a detailed description is provided in Section 2.
Our network is provided by an almost adaptive adversary that knows the network and the details of Spartan up to a small steps prior to the current round. This adversary moreover can design the network with high levels of dynamism, some aspects of which are heretofore unseen in the literature. First and foremost, the adversary can design the network with heavy and erratic churn, whereby, it is allowed to add/delete nodes at the rate of up to adds/deletes within any time period for some fixed . Such a heavy churn rate means that the adversary can replace all the nodes in rounds. To make matters worse, the churn can be erratic in the sense that there can be sudden spikes with up to nodes added or deleted within rounds (as long as it is preceded and succeeded by relative calm). Such heavy, erratic, and adversarial nature of churn makes the maintenance of a robust overlay network a difficult challenge to overcome.
Moreover, the degree of each node is at most . Each node that enters the network (after the first rounds called the bootstrap phase) is adversarially connected to one pre-existing node. To maintain a connected overlay in this setting will require a nuanced approach. To see the challenge, consider the following scenario in which there is a pre-existing node and a new node is connected to in some round . In round , is also a pre-existing node. So, enters in round and is connected to , while is churned out. This continues with a new node in each round connected to , while is churned out. In this chain, if a single node somehow fails to get connected properly to the overlay network, then all subsequent nodes will continue to be disconnected. In fact, in our model, up to new nodes can be simultaneously connected to the same pre-existing node. This means that a disconnected node could affect an exponential growing number of nodes as the rounds progress.
1.2 Our Results and Technical Contributions
Spartan is a framework that allows building and maintaining P2P overlay networks that are robust against heavy adversarial churn. We begin with the Spartan architecture (see Section 3) and then describe procedures to build and maintain Spartan (see Section 4 and Section 5). With high probability (whp)111We say that an event occurs with high probability if, under suitable choice of constants used by algorithms, the probability of its occurrence is at least , for any fixed ., Spartan possesses the following features.
It provides virtually addressable locations (called committees). These committees are collections of nodes connected together completely by overlay edges. See line number 5 of Algorithm 1, Lemma 8 and Lemma 6 for more details. 2. 2.
These committees are in turn connected together by means of logical edges. Each logical edge between two committees is a collection of complete bipartite overlay edges connecting nodes from either committees. The committees and the logical edges put together form a logical network. See line numbers 6 and 7 of Algorithm 1. 3. 3.
For concreteness, we have constructed the logical network to be a butterfly network with the committees as butterfly nodes and logical edges as butterfly edges. This provides us with the ability to address each committee by its location in the butterfly structure. See Lemma 7. 4. 4.
Every node in Spartan is within a distance of of every committee in the network (as seen in lemma 7). Moreover, there exists a simple bit-fixing algorithm by which it is easy to reach any required committee in rounds (see Chapter 4 in [20]). 5. 5.
We show how to build and maintain Spartan so that it is provably robust against heavy and erratic adversarial churn described earlier. In particular, we prove that the Spartan network survives (whp) for a number of rounds that is a large polynomial in . See Section 5 and Theorem 13 for a full explanation. 6. 6.
We show that an algorithm designed under a suitably defined CONGEST model of distributed computing for static networks can be efficiently simulated by the committees of Spartan. Therefore a host of applications and algorithms developed under the assumption that the network is static can be extended to dynamic networks. See Section 3.2 for a full description.
The key to Spartan’s robustness lies in the strength of committees, which are collections of nodes that work together to ensure persistence of activities (like storing, searching, and retrieving data required for building stable applications) despite churn. These committees are built to be self sustaining. Our procedures ensure that the committees stay robust in the face of heavy and erratic adversarial churn. Consequently, the network as a whole is robust. The notion of committees is not new. In fact, variations of our notion of committees have been employed in building overlays for over 15 years (for example, see Fiat and Saia [16]). Over the years, we have seen quite a few variations spanning both deterministic [17] and randomized overlay networks [12, 19]. However, our design allows us to provide a DHT solution that can be constructed quickly. Moreover, to the best of our knowledge, ours is the first known DHT construction that is provably capable of handling such heavy churn. Most importantly, we believe the conceptual core is very simple and easy to implement and provides a lot of flexibility for fine-tuning to meet specific requirements.
In the interest of clarity in exposition, we have not optimized our results for constants and factors. To avoid technicalities arising with ceilings and floors, we have assumed that and are integral.
1.3 Survey of Related Works
The notion of overlay networks sprang out of a need for bringing flexibility in operating a distributed system despite an underlying network infrastructure that we cannot modify. Viewed from this broad perspective, overlay networks have been studied since the 80’s, for example in [21] where the focus was on developing new network functionalities over a fixed interconnection network. Since the late 90’s, there has been an explosion of P2P ideas and technologies mostly motivated by the need for fully decentralized P2P networks. Ideas like consistent hashing [22] and web caching [23, 24], and P2P tools like Napster and Gnutella served as harbingers to the developments we have seen in the last two decades. These and other early developments led to significant research in P2P systems motivated mainly by the need for distributed (and often completely decentralized) churn-resistant distributed hash table (DHT) platforms for storing, searching, and retrieving resources that are in essence <key, value> pairs.
To elaborate a bit, research in overlay networks and P2P has since then moved along two parallel (but not necessarily disjoint) strands of ideas: structured and unstructured P2P networks. In structured P2P networks, nodes (or rather their IDs) and the keys share the same hash space. Nodes are hashed into a location in the network based on their ID and take up responsibility for resources that hash into that same space. This will allow a node seeking some resource identified by a key to hash the key and approach the node at that hashed location for the required resource. The big advantage with this approach is that there is now a clean and efficient algorithm to admit new nodes, store resources, and discover them when needed. The down side to this approach is that the nodes will usually have to be arranged in a somewhat predictable and rigid manner. This often makes such networks vulnerable to heavy churn and security breaches.
Structured P2P concept is perhaps best illustrated by Chord [4] developed by Stoica et al. Nodes in Chord are arranged in the form of a ring in which the hash values of IDs are in sorted order. Resources are then stored in the node with the closest matching hash value. Pastry [7] and Tapestry [8] are a couple of other similar P2P overlays. They both in fact employ prefix routing [23] presented by Plaxton et al. CAN (Content Addressable Networks) introduced by Ratnasamy et al. [6] around the same time arranges the nodes in a -dimensional space. If the space is filled up appropriately, routing is robust (with several alternative paths) and efficient; resources are on average within a distance . However, CAN is sometimes prone to failures due to partitioning of nodes. Viceroy [25] described by Malkhi et al. is another overlay network developed early in the history of P2P systems. They were early adopters of a topology very similar to the butterfly which we use. Another topology that has gained quite a bit of attention [26, 27, 28, 29] is de Bruijn graphs [30], which provide very short average distances between nodes. See Loguinov et al. [27] for a detailed study of the benefits of de Bruijn graphs.
In the alternative idea of unstructured overlay networks, nodes connect with each other randomly with guarantees of good expansion. This means that we can employ flooding [13] and random walks based techniques [31, 32, 12, 15, 33] for searching and restructuring. Although these are significantly more inefficient compared to structured networks, these unstructured topologies are significantly more resilient to the effects of churn.
Although churn is a very important factor in designing P2P overlays, it has received relatively low attention especially from theoretical researchers. As we mentioned earlier, Fiat and Saia [16] provided a P2P overlay design very similar to ours, but their analysis is limited to a one time failure (censorship in particular) rather than a more persistent notion (the way churn operates in reality). Their similarity to our work is two-fold in that they employ a notion of “supernodes" which is similar to what we call committees, and furthermore their supernodes are also strung together in the form of a butterfly.
Liben-Nowell et al. [34] were some of the earliest to recognize the need for a formal investigation of such persistent churn, and, towards this goal, provided a characterization of the rate of churn in terms of what they call the half life time period, which is defined as the time for a network to either double or halve in size (i.e., number of nodes.). Awerbuch and Scheideler [35] proposed Hyperring, which is a distributed dynamic deterministic data structure capable of performing concurrent inserts, deletes and searches, all in time. Similar randomized distributed data structures like Skip Graphs [36] and Skipnet [9] have also been presented in literature, but the rates at which they can handle churn is limited. Jacobs and Pandurangan [37] have presented a DHT capable of handling churn with insertions and deletions incurring at most overhead, but their claims are limited to stochastic insertions and deletions. An interesting deterministic P2P overlay network was proposed by Kuhn et al. [17]. They illustrate their approach via hypercube and pancake topologies. Each “node" in these topologies is actually a collection of peers and in this sense akin to our idea of committees. Their overlay can withstand a churn of up to nodes joining and leaving concurrently in every round. Moreover, their adversary is deterministic and therefore capable of observing the weakest spot in the network and targeting that spot for churn.
Recently, there has been a flurry of works that investigate P2P systems that experience persistent heavy churn, which we define as a P2P system in which up to some nodes can join and leave the network in every round. In [13], we showed that an unstructured overlay network – modeled as a network in which expansion is maintained in every round – is amenable to solving the agreement problem despite churn that is linear in the size of the network. In every round, up to nodes can join and leave the network for some small constant . We also extended some of these ideas to Byzantine agreement [32] and leader election problems [33, 14]. We also showed in [12] that we can store and retrieve a data item despite near linear number of nodes (i.e., up to nodes) churn in every round. In [12], we employed the notion of committees, but indexing the item requires the participation of nodes. Our current work also employs this notion of committees, but without the indexing overhead. More recently, we showed in [15] that expansion (in an almost everywhere sense) can be guaranteed despite heavy churn. Most of these works employ an oblivious adversary, i.e., one that is aware of the protocols and can design churn in the worst possible manner, but it must do so without knowing the exact random bits employed by the protocols.
Recently and independently Drees et al. [19] presented a P2P overlay network that can tolerate churn to the extent that all the nodes in the network can be replaced within rounds. Moreover, this is achieved by an -late omniscient adversary (formally defined in Section 2) that is oblivious only to the random bits employed in the most recent rounds. There are a few crucial differences between our models and our techniques are entirely different. In their model, the adversary must forewarn node insertions and departures at least rounds before they actually leave, whereas, in our case, the nodes can leave abruptly. Thus, in [19], the challenge is limited to quickly rebuilding the network every rounds using only the stable nodes that are guaranteed to remain in the network. Moreover, they do not show how their structure can be constructed. We show how Spartan can be constructed efficiently (in just rounds) and general algorithms designed to operate on stable networks can be effectively simulated on Spartan.
Organization of the paper
We begin with a formal description of the model in Section 2. We then present an overview of the Spartan overlay structure in Section 3. The overall design is described in Section 3.1. Then, in Section 3.2, we show how we can use Spartan for stable distributed computation, in particular, for implementing stable distributed hash tables. Then, in Section 4, we present protocols to construct Spartan. The maintenance protocols – including the crucial random walks based sampling procedure – are described in Section 5. Finally, we discuss a variety of extensions in Section 7 illustrating Spartan’s flexibility and end with some concluding remarks.
2 Network Model and Problem Statement
We consider a synchronous system in which time is measured in rounds. An adversary presents a dynamically changing sequence of sets of network nodes where , for some universe of nodes and , denotes the set of nodes present in the network during round . For simplicity in exposition, we require the number of nodes at any round to lie in for some fixed ; this limitation can be circumvented as discussed in Section 7. Each node in is assumed to have a unique ID that can be represented in bits. For the first rounds called the bootstrap phase, is stable; i.e., for , . Subsequently, the network is said to be in its maintenance phase during which can experience churn in the sense that a large number of nodes (specified shortly) may join and leave dynamically at each time step.
The churn rate models the level of churn that the adversary can inflict on . It is specified by a pair which implies that within any range of T consecutive rounds, at most C nodes can be added and (a possibly different number of) at most C nodes can be deleted. Formally, and , the adversary must ensure that and . The churn rate we consider is for a suitably small but fixed .
Any node can communicate with another node and/or form an edge between them (with ’s concurrence) provided knows ’s ID; this is akin to forming TCP connections and establishing communication with a device whose IP address is known. To facilitate edge formations, every new node that enters the network at a round is seeded with some limited knowledge of the network, typically just the ID of one other node present in rounds and . The adversary must ensure that each ’s ID is given to only some new nodes each round to avoid overloading . Communication, edge formation, and limits on the amount of seed information will be formally discussed shortly.
Overview of Problem Statement. Our goal is to design a protocol that (whp) constructs and maintains an overlay network , where each denotes the overlay graph in round . In particular, is the set of overlay edges created by the protocol. The protocol must
construct during the bootstrap phase, and 2. 2.
maintain it in a well-defined robust manner despite churn for rounds (whp).
Importantly, the overlay thus constructed and maintained must be sparse with degree bounded by . We will prove several useful characteristics of the overlay that we build. In particular, we show that protocols in the CONGEST model and node capacitated clique models designed for static networks can be simulated in Spartan (see section 3.2).
Adversary’s characteristics. Our protocols operate against an almost adaptive adversary that subsumes the power of adversaries employed in both [15] and [19]. We say that an adversary is -late omniscient if, at each round , it is aware of all the algorithmic activities (including the random bits employed by the algorithm) from round 1 up to round , but is oblivious to the execution and random bits employed by the algorithm from round onward. Such an adversary will, of course, be aware of the overlay network that the algorithm has constructed up till round . Furthermore, we say that an adversary’s actions are instantaneous if its decision to add or remove nodes take effect immediately without any forewarning. Our adversary’s actions are instantaneous and it is assumed to be -late omniscient. In contrast, [15] employs the weaker oblivious adversary, but, like ours, its actions are instantaneous. The adversary in [19] is also -late omniscient, but its actions are not instantaneous. Any changes to the network (both inserting and deleting nodes) must be announced to the protocol -rounds before the change occurs.
Seed knowledge in new nodes. If a new node is completely unaware of the rest of the network, it will be impossible for the node to integrate into the network. Therefore, we assume that new nodes are seeded with some very limited information.
At the start of the bootstrap phase, we assume that each node in is seeded with IDs, each chosen uniformly at random (UAR) from IDs in . This is a reasonable assumption. If Spartan is to be built from scratch, a centralized mechanism will be required – and this assumption ensures that the role of the centralized mechanism can be limited to providing the random samples. On the other hand, we may wish to build Spartan from a pre-existing P2P network. Random node IDs are typically easy to obtain in this scenario as well because most P2P networks have good expansion and facilitate random walks that can mix fast – a property that can be leveraged to obtain random sampling in a fully distributed manner. In fact, many of the existing works on P2P networks including our own (see Section 5.1) provide efficient sampling procedures [31, 32, 12, 15].
Any node that enters the network at some round during the maintenance phase must be seeded with the ID of at least one pre-existing node . Notice that this requires both and to be present during round . Node can then contact and, in turn, can provide with some information that will help to become a well-connected node in the network. However, if too many new nodes are seeded with the ID of the same node , then can be overwhelmed. So we limit the number of nodes seeded with the ID of any particular node to .
Overlay Edge Formation and Communication. Each node is provisioned with ports through which overlay edges can be established and communication can take place. Suppose wants to establish an overlay edge with . Then, must send an edge formation request to and can either accept or ignore (thereby implicitly rejecting) the request. Any overlay edge thus formed, in essence, connects a unique port in with a unique port in . Consequently, such an overlay edge can be formed only if both and have ports to spare. After the overlay edge is formed, either nodes can unilaterally drop the overlay edge at the end of any round, thereby freeing the associated ports in both nodes. If one of the endpoints of an overlay edge is churned out by the adversary, the edge is immediately dropped and is immediately aware of the port that has been freed up.
Nodes communicate and form overlay edges via message passing. A node can communicate with another node and/or form an overlay edge in only if knows the ID of ; one can think of ’s IP address as playing the role of its ID. Node , in addition to knowing the IDs of all its neighbors in , may also know IDs of nodes that it has received through messages. So is not required to be a neighbor in . In order to avoid congestion, we limit the number of bits sent from to per round to be at most in size. Furthermore, no node can communicate with more than other nodes per round. The protocol must ensure that no node sends more than messages per round and no more than other nodes send messages to any given node.222In fact, the only place where any node needs to send and receive messages (but no more than messages whp) is our random walks based sampling protocol (see Algorithm 2). This may however be unavoidable under churn. See the remark at the end of Section 5.1 for more details.
The facility to communicate without forming edges may lead one to question why our model should even bother to form edges. We emphasize that there are several significant advantages to forming overlay edges. When is not an overlay neighbor, may not know whether is still in the network. If is no longer in the network, ’s message to will simply be dropped. Moreover, will be guaranteed to receive messages sent by its overlay neighbors, but if isn’t its overlay neighbor, then, may not receive the message sent by ; such situation may arise if the number of nodes sending messages to simultaneously exceeds the number of free ports in . Finally – and perhaps most importantly – the very essence of building the overlay edges is to facilitate other protocols to run on the overlay network so constructed. Consequently, if we can prove that the overlay network that we construct has certain properties, application designers can build their protocols upon such guarantees.
Breakup of a round. To avoid a fundamental impossibility (from Theorem 2 in [15]), each round permits a “handshake" between two communicating nodes. So, if sends a message to in a round, then has the option of responding within the same round. Thus, in each round , each node in sequence
gets to know which ports are active (as neighbors can drop edges or be churned out),
- -
performs local computation,
- -
sends up to messages typically (but not limited to) requesting for some information or requesting the formation of an edge,
- -
receives up to messages sent by other nodes,
- -
performs local computation.
- -
sends up to messages limited to responding to requests received, and finally
- -
receives up to messages.
Our protocol employs the second send/receive part of rounds only when forming edges.
3 The Spartan Design and Applications
In this section, we first describe Spartan’s design comprising committees (also called supernodes in literature) strung together in the form of a butterfly network. We then illustrate the power of this design to perform distributed computation on the network of committees. In particular, we show how algorithms in the CONGEST model of computation can be simulated in Spartan. We also describe how distributed hash tables can be implemented in Spartan. In subsequent sections, we will describe how to build and maintain Spartan in a concrete manner.
3.1 Spartan Design
The key building blocks of Spartan are committees. The Spartan architecture comprises a set of committees. Each committee consists of a dynamically changing set of member nodes chosen randomly with the guarantee that the committee has nodes as its members at any given time. The nodes within a committee are completely connected via overlay edges. Any pair of committees must be able to form logical edges between them. A logical edge between two committees is the complete bipartite set of edges between the nodes of the two committees. The set of committees and the set of logical edges together form a wrapped butterfly framework graph . The wrapped butterfly has the following structure. The committees in (of cardinality for some ) are arranged in rows and columns. Thus, each committee can be addressed by a pair , and , denoting its row and column numbers, resp. Logical edges connect committees in a column to committees in column . Two such committees and are connected by a logical edge if and only if either
both committees are in the same row, i.e., , or 2. 2.
and differ at exactly the th bit in their binary representations.
Figure 2 shows a schematic; for more details, please see Chapter 4 of [20]. The committees, logical edges, and must be built during the bootstrap phase and maintained (whp) for a sufficiently large number of rounds. Since is a constant degree graph, the overall degree of each node (in terms of the number of incident overlay edges) is at most . We require the Spartan protocols to create and maintain , , and therefore .
The size of committees has two main advantages that we exploit. On the one hand, the committees are large enough to ensure that an adversary that does not know the current members in the committee will (whp) be unable to disrupt it beyond a manageable level. This advantage can however be lost quickly if we leave a committee unattended for rounds under a churn rate of as all nodes can be replaced within rounds. Thus, nodes in the committees are dynamic. Each node moves to a new committee every rounds on expectation. In particular, unless churned out, a node stays in a committee for a number of rounds drawn from the geometric distribution with parameter .
On the other hand, the size of each committee is small enough that the set of committee members at round of a committee denoted can be encoded within bits. (We ignore the subscript when the round in question is clear from context.)
We say that a committee has been robust until round if (i) it was successfully created during the bootstrap phase, (ii) at every round , the number of members was at least nodes, and (iii) in every pair of consecutive rounds and , there is at least one common node in in both those rounds. Likewise, we say that a logical edge between two committees and has been robust until round if (i) both and have been robust until round and (ii) the logical edge was successfully created during the bootstrap phase and, at every round , there is an edge between every and . We say that a Spartan implementation has been robust until round if all the committees and the logical edges have been robust until round . The forthcoming sections are devoted to showing how we can construct and maintain Spartan for an arbitrarily large polynomial in rounds. In the rest of the section, we will briefly discuss how Spartan can be useful for distributed computation and distributed hash tables.
3.2 Distributed Computation and Distributed Hash Tables in Spartan
While several works have addressed computation in highly dynamic networks, we take a significant step forward in designing a mechanism whereby standard algorithms in established static distributed computing models can be executed off-the-shelf. Additionally, distributed hash tables with time storage and lookup times can be implemented despite heavy churn.
Given a Spartan network that is robust for a sufficiently long period of time, we can execute any CONGEST algorithm that is designed to run on a butterfly network with nodes provided that the state of the nodes executing remains small. Each committee takes up the role of executing as if it was a node in a butterfly of nodes. However, the constitution of each committee can change with time, so care should be taken to ensure that any new node entering the committee should be updated with the current state of the node that is being simulated by the committee. This requires a minor restriction that the state of the nodes executing must be bounded by bits. (Of course, one can trade off this restriction with the churn rate. CONGEST algorithms with larger states can be executed as long as the churn rate is slow enough to ensure that states can be copied into new nodes.) In addition, as per our model, committees can also directly communicate with each other even if they are not neighbors in the butterfly, provided that the nodes in each committee know the IDs of nodes in the other committee. This means that any algorithm designed for the node capacitated clique model [38] can also be executed.
Importantly, Spartan can be used to implement distributed hash tables. The committees serve as addressable locations within the butterfly network and any data item with a well-defined key value can be stored in the location given by the hashed value of its key. We give a brief overview here and refer to [37] for more details. Whenever a data item in the form of a pair is to be stored in the network, the data item is routed to the location given by the hashed value of the key. Similarly, when the item corresponding to a particular key is required, the request is routed to the location given by the hashed value of the key and retrieved from the committee in that location. Care should be taken to ensure that the data item is maintained properly in the committee. As in the case of simulating CONGEST algorithms, the data items stored in each committee should be copied to new nodes that enter the committee.
4 Constructing Spartan During the Bootstrap Phase
We will now present the steps to construct Spartan during the bootstrap phase. Algorithm 1 provides the high level steps, each of which requires at most rounds. Moreover, all these steps respect the congestion requirement in that no node will either send or receive more than messages at any round; we will explicitly discuss this whenever it isn’t obvious. Each step is then explained and analyzed in detail subsequently.
Line number 1 of Algorithm 1. We claim that a leader can be elected by the following elementary algorithm: each node generates a random number with a sufficiently large number of bits and creates a message with the random number and its ID. It then choses nodes uniformly at random from the samples it has been seeded with (without replacement) and floods the message to those nodes. In every round thereafter, it continues to flood the message with the largest value seen so far. After a period of some rounds, we claim that (whp) only one message, that belonging to the node which generated the largest random number, will survive and would be elected leader.
Lemma 1**.**
The leader election algorithm employed in line number 1 of Algorithm 1 can, whp, correctly elect a unique leader in rounds. Furthermore, each node consumes at most of its seeded random ID samples.
Proof.
From the description of the protocol, it is clear that each node consumes at most seeds from its samples. To see how the algorithm will take rounds, consider this. Each node chooses nodes at random at the start of the protocol (and thereafter uses the same nodes) to flood its message. This means that we are looking at a random graph with in , with being a sufficiently large constant. We know that for such a graph, the diameter is in , thus with high probability, ’s message will be flooded through the network in rounds.
∎
Line number 2 of Algorithm 1. The tree is created in two phases, both of which run for rounds and each node consumes at most of the seed tokens. The first phase begins with the tree being just the root node . Each node in the tree begins with two vacant spots for its children. At each time step, each node in the tree that has at least vacancies queries nodes at random and requests them to become its children. Nodes not in the tree (called non-tree nodes), if requested, accept at most one such request and fill the vacancy and become nodes in the tree. The first phase lasts for a suitably large rounds after which (as shown in Lemma 2) the tree consists of at least nodes (whp). Note that such a tree will have at least vacancies as well.
During the second phase, in each round, each non-tree node randomly probes a node (again by consuming one of its random samples) to see if it is a tree node with a vacancy. If is a tree node and has a vacancy, then, it will respond positively to exactly one such request among the several it might have received. If responded positively to , then becomes its child. Again, we will show in Lemma 2 that within rounds, (whp) the number of non-tree nodes dwindles to zero.
We emphasize that in both phases, the sampling is from the global set of nodes. In the first phase (resp., second phase), samples that fall on the nodes already in the tree (resp., node not yet in the tree and tree nodes that already have two children) are ignored. Crucially, by a simple balls-into-bins argument, this ensures whp that no node receives requests per round. Moreover, just the seed samples per node will be sufficient to execute line number 2 of Algorithm 1.
Lemma 2**.**
*For the procedure described in Line number 2 in Algorithm 1, the following two statements hold:
i) After a sufficiently large rounds of the first phase, (whp) the tree has grown to have at least nodes in it.
ii) After a sufficiently large rounds of the second phase, (whp) the number of nodes not in the tree has dwindled down to zero.
Because of i) and ii), after a a sufficiently large rounds, (whp) the tree has grown to have nodes in it. Furthermore, no node sends more than messages and (whp) every node receives at most messages.*
Proof.
We will first proceed to prove statement (i). Statement (ii) holds by symmetry. Let’s fix a round , let the number of nodes currently in the tree be , this means that there are exactly vacancies and non tree nodes. We want to calculate exactly how many vacancies are being filled in a round. Let be the random variable that counts the number of vacancies that has been filled in a round . Note that the event of an vacancy being filled by a node is precisely the same as node becoming a non-tree node. Which is also equivalent to node getting at least one invite. Thus, at any round , counting the number of nodes that received an invite, gives us a measure on the number of nodes that have become a part of the tree in (i.e., ). Let be a random variable such that
[TABLE]
Clearly, . At this juncture, it is important to note that the random variables are not mutually independent. However, as they are negatively associated (chapter 4 of Dubashi and Paconesi’s book [39]) (as we will briefly explain in the following paragraph) we are free to use various probabilistic bounds. To see how the random variables are negatively associated, we may visualize the act of vacancies inviting nodes as a balls and bins model. In the act of a vacancy inviting a node to be its child, the node with the vacancy () uses one of its random samples to invite another node (that’s chosen UAR from the network) to fill its vacancy from among the nodes in the network. This is comparable to the act of throwing a single ball into bins. Clearly in this setting, a non tree node getting at least one invite is equivalent to a bin being non empty (i.e., getting at least one ball). We now introduce the following two useful lemmas:
Lemma 3**.**
Consider the balls and bins model where balls have been thrown at bins ( not necessarily equal to ). Let count the number of balls in bin . The random variables are negatively associated. [ Eg. 3.1 [39]]
Lemma 4**.**
In the same balls and bins model introduced in Lemma 5, under the Discrete Monotone Aggregation property, random variables that are non-decreasing or non-increasing functions of the negatively associated variables are also negatively associated. [ Eg. 3.2 [39]]
From both lemma 3 and lemma 4, it’s clear that the random variables ’s are negatively associated. Thus, for any value of , when balls are thrown at bins,
[TABLE]
When there are nodes in the tree, and therefore vacancies:
Now we may the use the following standard inequality that states:
[TABLE]
To get:
[TABLE]
For any and , since the fraction approaches , this means that more than half the vacancies are filled with high probability. But, as approaches a fraction of (say for any , ), the above probability approaches which does not guarantee high probability. However, even as approaches a fraction of , at least one fourth of the vacancies are filled in every round with high probability.
Recall that . We have is . The expected number of vacancies filled in a round is then . We first use the binomial expansion to obtain the following upper bound:
[TABLE]
Then we use the above bound to obtain a lower bound on the expectation as follows:
[TABLE]
We will now use the following useful variation of the chernoff bound from Mitzenmacher and Upfal [20] (where denotes the expectation of random variable )
[TABLE]
Since and thus , the probability that at most one fourth of the vacancies are filled in a round is given as:
[TABLE]
Thus, for each round , where , whp, at least a fourth of the vacancies are filled. Call such a round in which one fourth of the vacancies are filled a good round. Since for each round , the probability of it being a good round is high, for given values of , we will be able to calculate the value such that when we run the algorithm for rounds, there are at least good rounds with high probability. Thus in rounds, the number of nodes in the tree has grown to . Statement ii) holds by symmetry, as we are looking at the same balls and bins scenario as above but with the tree nodes and non-tree nodes switched (as in the second phase, the non-tree nodes are looking for vacancies). Therefore, the same argument holds (only in reverse as we start with at most non-tree nodes that slowly dwindle down to zero). Thus in rounds, we have a tree on nodes with high probability.
Finally, it is clear from the protocol that each node only sends messages, but it is not immediately clear whether they only receive messages (whp). To show this upper limit on the number of messages received, we note that – in each round of both phases – nodes send messages to randomly chosen other nodes. So this process can again be viewed as a balls-into-bins process with at most balls being thrown randomly into bins. So no bin will receive any more than balls (whp). This balls-into-bins limit implies that no node will receive more than messages at any round. ∎
Lemma 5**.**
There exists an round implementation of Line number 2 of Algorithm 1 that (whp) results in a binary tree whose height is .
Proof.
From lemma 2 it follows that there exists a procedure that can in (whp) build a binary tree that includes all the nodes in the network. The height constraint follows from the fact that in any round the height of the tree being built can be increased by at most 1. Since the the procedure is terminated after rounds, it follows that the height of the tree can be at most . ∎
Line number 3 of Algorithm 1. The inorder traversal number is quite easy to compute. First we use a bottom-up convergecast, in which each node (starting with the leaf nodes) sends up the number of children rooted at their sub-trees. This ensures that each node in the tree knows exactly how many descendants are in each of their left and right subtrees after a period of rounds.
When the root gets this information, it can compute its inorder traversal number as , where is the number of nodes in it’s left subtree. Notice that all the nodes in the left subtree will have inorder traversal numbers in the range while the nodes in the right subtree will have inorder traversal numbers in the range . The root then passes these ranges to its appropriate children. We can continue this process in a top-down recursion such that each node gets to know its inorder traversal number and passes on the appropriate range of inorder traversal numbers to its two children. Since the number of committees can be precomputed, the first nodes (in the inorder traversal ordering) can identify themselves as committee leaders.
Now, we again perform a bottom up process in which the committee leaders organize themselves as a ring. Observe that a ring (along the lines of a circular linked list) can be identified by the address of one arbitrary node in the ring called the head; each node within the ring has an overlay edge pointing to its clockwise neighbor and another to its counter-clockwise neighbor. First, the leaves that are also committee leaders form a trivial ring with just one node and pass the address of the head (here the head is themselves) to their parent. The parent then forms a new cycle by merging the (up to) two cycles (through the use of their cycle heads) it received from its two children. Note that it includes itself as a part of the cycle only if it is also a committee leader otherwise it just passes the head of the merged cycle up to its own parent. All nodes also ensure that the head of the cycle is always the node with the smallest inorder traversal number. This process continues until the protocol reaches the root, where the root puts together the two cycles from its left and right subtree (it includes itself in the th position if it is a committee leader) and creates a contiguous cycle on nodes with the labels decided by the inorder traversal. The rest of the nodes (including the root if it is not part of nodes may now be discarded). Clearly, in rounds, where is the depth of the tree, the nodes would have all organized themselves into a cycle. We know from lemma 5 that the height of the tree is Thus,
Lemma 6**.**
There exists an round implementation of line number 3 of Algorithm 1 that selects a set of nodes and arranges them as a ring.
Line number 4 of Algorithm 1. Before we start constructing the butterfly network, we must preprocess the ring of committee leaders in three steps to reach a suitable state depicted in Figure 3.
First, the ring is arranged in the form of a 2D grid comprising columns and rows as shown in Figure 3 (i). This is easy because each node with inorder traversal number, say, can position itself in row and column . It has to then connect with nodes with inorder traversal numbers (when ), (when ), (when ), and (when ). Note that connecting with and will require rounds, but no more than rounds. This will allow us to identify each node in the grid by the pair , where is the row number and is the column number.
In the second step of the preprossessing stage, we first structure the grid into nested blocks as defined shortly; see Figure 3 (ii). Blocks are suitably nested in rectangular subgrids of some columns and rows, . The nodes in the leftmost column of each block are called its hinge nodes or just hinges. If the hinges of a block occur in some column in the grid, we say that the block is hinged at column . Column in the grid has blocks hinged in it. Therefore, we refer to the th block (from the top, starting at ) hinged in column as block , . Let such that ; then, every block hinged in column has rows. Block thus refers to the rectangular subgrid with node as the top-left node and as the bottom-right node.
We will now describe the second step of the preprocessing stage in which the topmost hinge vertex must be connected to the bottom most hinge vertex in every block; see Figure 3. Clearly, this is already true in blocks hinged at column ; this sets the base case for an inductive procedure. Consider a block (for parent) hinged at column that has two smaller blocks (for upper) and (for lower) nested within it, both hinged at column . Once the top and bottom hinge nodes of and are connected via overlay edges, within a constant number of rounds ( to be exact), the top and bottom hinge nodes of block can be connected in the following manner. The top hinge node of passes the request to the top hinge node in (as they are adjacent in the grid), which in turn passes the request to ’s bottom hinge. From there, the request is sent to the top hinge node of (which is right below ’s bottom hinge in the grid), which can then pass it on to ’s bottom hinge. From there, the intended final recipient is just one hop to the left in the grid. This means that after (and thus ) rounds, ensure that the top hinge node of every block is connected to its corresponding bottom hinge node.
The third (and final) step in the preprocessing stage is quite a small step. For every pair of blocks and hinged at the same column and nested within the same parent , we ensure that their respective top hinge nodes are connected and likewise that their bottom hinge nodes are also connected. This is an easy rounds step since the top and bottom hinges are connected for every block; see Figure 4 (i) & (ii).
Butterfly construction. To construct the butterfly network, we perform the following steps in parallel at every parent block that has two blocks and nested within it. Consider one such example as shown in Figure 4 (ii) and, equivalently, in (iii), wherein, two blocks and consisting of columns and rows are initially connected at their respective top hinges and their respective bottom hinges. Number the hinges of both blocks and top to bottom from 0 to . The final goal is to reach the configuration shown in Figure 4 (x), but an important intermediate goal is to get the th pair of hinges from and connected for (shown in Figure 4 (viii)). One can easily achieve this in rounds, but this will translate to an overall rounds when we consider the pair of blocks nested within the outermost block. But, with care, this can be achieved in rounds (and thus we will need rounds in total).
Our approach involves stages (in total) with each stage requiring rounds. After each stage , we ensure that the first pairs, i.e., th hinges in and for , are connected (as shown in Figure 4 (iv) to (viii)); in particular, note (iii), (iv), (vi), and (viii). In addition, to help the next stage, we also ensure that in each block the th hinge is connected to the th hinge node; these are called helper edges. In stage , we wish to get the th hinge nodes from and connected for . This can be achieved in three steps because the th node in can (via a helper edge) send its address to hinge in . Since th pair of hinge nodes are already connected in a previous stage, the address can be passed on to the hinge in . Again, via a helper edge, the address can be transmitted to the th hinge in , thereby enabling the th hinge in to establish an edge with the th hinge in . We are not done yet because the helper edges in the next stage must have twice the reach as the helper edges in the current stage. For this purpose, we consider the two helper edges incident at some th hinge in (say) leading to th hinge and th hinge. Clearly, the th hinge can send its address to the th hinge in two steps via the two helpers we considered. Thus, th hinge can establish an edge with the th hinge as required. Of course, this will have to be repeated in parallel for all and for as well. Once the helpers for the next stage have been constructed, all other previously constructed helpers (barring those that are also grid edges) can be deleted. This will bring us to (viii) (or equivalently (ix)) in Figure 4.
Finally, to get the butterfly edges, notice that the first hinges in (from top to the half way mark) must connect with the hinges in and the second hinges in must connect to the hinges in . More precisely, for , the th hinge in must connect to the th hinge in . For , the th hinge in must connect with the th hinge in . This can be easily achieved in rounds because the endpoints that must be connected are within two hops of each other. For example, for some such that , there is a grid edge to the th hinge in and since the th hinges in and are connected, the th hinge in can transmit its address to the th hinge in in 2 rounds, which will then allow the th hinge in to establish an edge with the th hinge in in one round.
Finally, we note from the description of the protocol that the congestion limits are not violated. Thus,
Lemma 7**.**
There exists an round implementation of line number 4 of Algorithm 1 that does not violated congestion limits and takes a ring of nodes and arranges them as a wrapped butterfly network of columns and rows.
Line number 5 of Algorithm 1. We will reuse ideas from line number 2 to implement line number 5 and, just as before, we will implement the line in two stages. The initial stage ensures that each committee gets nodes with high probability. This is done by each committee leader actively trying to recruit one member for every round for a period of rounds. Once the first stage is finished, the second stage ensures that any unattached node can find a committee to join within rounds. Let . In the first stage, each committee will garner exactly budget nodes in the following manner. Each committee leader uses its samples to invite one random node per round until it inducts budget nodes into its committee. Consider a committee with leader node . At each round, there are at least nodes that did not receive any invitation from any other committee leader. This means that the probability with which committee leader may find a node for its committee is at least . That is, when is , then is at least ().
If we define to be the random variable that counts the number of rounds before a committee leader can get its node after it got its one, then clearly gives the number of rounds required for a committee leader to get budget nodes. Now these are not independent. Let’s look at the construction of a similar random process, such as a coin toss in which you count the number of rounds before you get budget heads. In this construction, a coin turns up heads with probability exactly . Now this is a more pessimistic version of the same events described above and hence will take longer to reach the required goal of budget heads. In the described experiment, this means that on expectation it takes at most rounds for the coin to turn up its head after its one, , which then means that on expectation it takes at most rounds to get budget number of heads. We may then use Chernoff bounds to show that for any value of and any fixed value of , the probability it will take more than rounds to reach budget number of heads is at most (from Theorem 4.4 [20]). Going back to our original scenario, this means that for any committee leader it will take at most rounds with probability . We may now use the union bound to show that all committee leaders can get the required number of nodes in rounds, with probability at least for any fixed .
After rounds, there can be still at most nodes left in the network that are not part of the committee. At this point, we begin the second stage, where each unattached node tries to join a committee. Each node that has become part of the committee has a budget of 1 node which it can use to induct into its committee. In the second stage, in every round an unattached node tries to probe a random node to become a part of committee. If has not exhausted its budget then it accepts request, otherwise tries again with a different node. Since even if all unattached nodes become attached there will still be nodes that would not have exhausted their budget, each un-attached node can find a committee with probability at least . In a manner similar to above, this means that an unattached node can clearly succeed in finding a committee, whp, in rounds. Again, using union bound, we can guarantee that, whp, each unattached node can find and become a part of a committee rounds. Note that the size of any committee can at most double at the end of the second stage as described here (since each committee member can accept exactly one non-committee member), so no committee will end with more than members. Thus,
Lemma 8**.**
There exists an implementation of line number 5 of Algorithm 1 that (whp) requires rounds and ensures that every committee gets member nodes.
Line number 6 of Algorithm 1. Since all nodes know their respective committee leader’s ID, they will send their own ID to their leaders. The committee leader will form a list of all nodes in the committee and communicate this to all nodes in its own committee. The individual nodes will then establish overlay edges with all other members in the committee. At this point, the committee leader can construct the CML and disseminate to all nodes in the committee
Line number 7 of Algorithm 1. The committee leaders exchange CMLs with their neighbors in the butterfly network. Each committee leader in turn passes on the CMLs it received to nodes in its own committee. If and are two neighboring committees in the butterfly network, then the logical edge between and comprises overlay edges between every pair in ; here we are abusing notation and using and to refer to the committees as well as the set of nodes that make up each resp. committee.
5 Maintaining Spartan
We now turn our attention to the maintenance phase when churn can significantly impact the network. Maintaining Spartan comprises two parts: the primary maintenance protocol and the supporting random walks based sampling protocol. These two interdependent parts execute in tandem.
Recall that each committee is identified by its row and column numbers in the Spartan structure. Thus we can always pick a random committee just by choosing a random row and column in the Spartan structure. However that will not be useful for communicating to that random committee because for some node to send a message to members of , will require ’s current committee members list CML(C). The random walks based sampling procedure that we provide ensures that each node gets CMLs of randomly chosen committees every rounds. Using these random CML samples, the maintenance protocol ensures that Spartan is robust (whp) for an arbitrarily large number of rounds.
5.1 Random Walks Based Sampling
We now show how each committee in Spartan can obtain the CMLs of uniformly and independently chosen committees using random walks for any ; in Section 5.2, we set in order to obtain CML samples.
Before we describe the procedure, we clarify a couple of key issues. Firstly, this sampling procedure assumes that Spartan is robust. Secondly, the samples produced by this procedure comes with a shelf life of rounds because eventually, the nodes in the committees either move away to other committees or just churned out. To formalize this, we say that a sample is valid at a given round if the intersection between the list of IDs in and the IDs of nodes in at round is of cardinality at least . We later show that (whp) Spartan committees retain at least of their members for rounds (see Lemma 12).
Normally, the random walks would require steps to reach a random node, but we adapt the well-known pointer doubling technique [40, 19] to our context to achieve an exponential speedup of steps. The steps are described in Algorithm 2 under the assumption that Spartan is robust during its execution, which is reasonable because the sampling procedure is designed to work in tandem with the maintenance protocol that will guarantee robustness.
Theorem 9**.**
*The sampling procedure described in Algorithm 2 where each committee initiates tokens conforms to the model described in Section 2. Furthermore, the following two statements hold.
1. For any token (chosen when tokens were first generated), if survived all the iterations in Algorithm 2, then the final destination is equally likely to be any of the committees in Spartan.
2. With high probability, for every committee , the number of tokens with as the source and the number of tokens with as the destination are both at least .*
Proof.
We first prove statement 1 followed by statement 2. The correctness and conformity to the model specifications follow suit.
To prove statement 1, we first recall a basic fact about butterfly networks. Consider a walker in possession of the binary representation of some . Let us suppose she performs the following bit-fixing moves times from her current position . For any bit fixing moves, let (using the decimal representation of ). At any step in the bit fixing process, if the th bit of and are equal, she moves along the direct edge; otherwise, she moves along the flip edge.
Using the above procedure, at the end of moves, she will be in row [20]. Thus, when we perform the same bit-fixing procedure with a random bit vector instead, we will reach a random row. It now suffices to show that any token that survived till the end of the first for loop in Algorithm 2 performed the required random bit fixing steps. (The random column is obvious from the second for loop.) During round 1, clearly every token has taken a random bit-fixing step, i.e., either along the direct edge or the flip edge. For the sake of induction, let us now suppose that at the end of round , every token has made random bit fixing steps; we have already established the basis for this. During round , is matched with another token, say , both of which have fixed bits (see Figure 5). Thus, when is matched with , inherits the bit-fixing steps performed by , and therefore, the updated has fixed bits from its source column by the end of round .
We now turn our attention to statement 2. Note that, in every round for every , at the start of every round, each committee has the same value of , as each of them started with the same number of tokens and the value of is the same.
Claim 10**.**
*The following bounds hold with high probability at the start of every round . For any fixed
-
The number of tokens with as source and is within .
-
The number of tokens with as source and is within .
-
The number of tokens with as destination is within .
-
The number of tokens with as destination and is within .
-
The number of tokens with as destination and is within .*
Proof.
The first two bounds are straightforward applications of Chernoff bounds because each of tokens with as source is independently and equally likely to have or ; recall that the rand values follow the geometric distribution with parameter 1/2 and all tokens with have already been discarded. Once bound 3 is proved, bounds 4 and 5 will also follow in a fashion similar to 1 and 2. So we will now focus our efforts on bound 3.
At the start of round , we know that the following two statements are true of any surviving token i) it has walked for steps. ii) Because of the inherited bit fixing in the first for loop of the algorithm 2, the destination of a surviving token is equally likely to be any of committees. Thus by symmetry, if you fix a committee , the possibility of being a token’s destination is . Thus, since each committee in a round would be propagating at least tokens (Note that since is constantly updated due to step 7, it is thus is a lower bound for both or ), there are thus a total of at least tokens. Which means that on expectation at least tokens will have their destination as , we may then use standard Chernoff bounds [20] to prove bound 3. Bounds 4 and 5 follow in the same manner as bounds 1 and 2. Tokens with as destination are equally likely to have or . Thus, following bound 3, we can use Cheronoff bounds to show that bounds 4 and 5 follow. ∎
The rest of the proof is conditioned on the bounds stated in Claim 10. To complete the proof of the theorem, we need to show for every committee there will (whp) be at least tokens that have as source at the end of one iteration of the algorithm. Once this is shown, we know that, after rounds, the number of tokens will (whp) be at least , which will complete the proof.
Let’s look at the very last round in the first for loop, that is the very last matching of tokens. Let be the set of tokens with as source and ; we know (holds true due to the inductive nature of Claim 13.1). For any token , the destination in round is going to be a committee , which had been set by inheriting the bit fixing steps of a token in the previous round. Now at , any token that is not matched with some is going to be discarded. What is the probability of ’s survival? We know that there can be at most tokens at whose destination is and we also know that there at least tokens at whose source is and whose random number is (because of claim 13). Thus for to not survive, it should be discarded at , the probability of which is exactly . Notice that the dependencies are negatively correlated, i.e., when we know that some token is discarded (at ) it only decreases the probability that some other token is discarded (as it increases the probability that the second token is matched). To see how, we may again look at the token matching phenomenon as a balls and bins scenario, in which the tokens that arrive at are the balls and the tokens whose source is are the bins. Again using the properties of 3 and 4, we can argue negative correlation. Thus after the the last matching, the probability that a surviving token is not discarded is at least . We can then calculate the expected number of tokens that survive from as at least . Thus, by applying the generalized Chernoff bounds from [20],
[TABLE]
we can prove that the probability that more than a fourth of tokens will be discarded is at most . Thus with high probability, at least tokens survive. Any such surviving token then inherits a final bit fixing step due to step 8 and since there are no possibilities of discarding the tokens thereafter, all of the tokens that have survived steps 1 through 7 end up at with a destination chosen uar from the butterfly network. This also concludes the proof of statement 9, thus proving that each committee can have tokens at the end of Algorithm 2. ∎
During the maintenance phase, this sampling procedure must be initiated repeatedly every rounds with . Thus, each committee gets tokens every rounds and they can be distributed among the nodes in the committee. There is one subtlety that must be resolved. How do the nodes within a committee share the tokens? We assume that the list of members in the committee is common knowledge among all the members. The node with the smallest ID then divides up the tokens into chunks of tokens and sends each chunk to a member so that every member has at least one chunk.
Remark: This random walks sampling procedure may require nodes to send/receive messages (but not more than per round). We believe this is an inherent bottleneck that we cannot avoid when each committee requires random walks within a period of rounds. To see this, notice that the total number of random walk samples needed is and each of them – to mix properly – must have walked steps. This means that the network must send/receive at least messages within a period of rounds, which, by the pigeonhole principle, will require nodes to send/receive messages per round. This however does not preclude the possibility of a more complicated sampling algorithm that may work with no more than messages per node per round.
5.2 Maintaining the Spartan Implementation
We now turn our attention to maintaining Spartan. At the end of the bootstrap phase, our network has committees. Our goal in the maintenance phase is to ensure that all such committees and logical edges are robust (whp) for number of rounds despite a churn of up to .
Sampling Cycles. Throughout the maintenance phase for as long as Spartan is robust, all the committees repeatedly execute the random walks based sampling procedure in unison once every rounds. They set . Consequently, every committee will obtain samples every rounds. These samples will remain valid for the next rounds during which they will be consumed. These sampling cycles are illustrated in Figure 6.
With the sampling cycles executing repeatedly in the background, the nodes perform their maintenance protocol (see Algorithm 3 for an event-driven pseudocode). At a high-level, the nodes move to new random committees every few sampling cycles.
Node-level Invariants. Throughout the maintenance phase, we need to maintain two crucial invariants (mentioned below) for every node in the network (after its first rounds). At a high-level, the sampling cycles ensure that the first invariant is guaranteed and the first invariant ensures that the nodes can maintain the second invariant. The second invariant then ensures that all the committees (and therefore Spartan as a whole) will be robust (whp) against a -late adversary, which in turn ensures that sampling will work, thereby completing the cycle of dependencies. See Figure 7 for an illustration.
Invariant 1: Random Samples.
Every node has at least two valid CMLs corresponding to random committees at all times.
We will now briefly describe the behavior of a node in the Spartan maintenance cycle (Algorithm 3 has more details). When a new node enters the network, it is seeded with the ID of an arbitrary node . Node then requests and receives a CML corresponding to a random committee that it uses to move to a random committee . As a new member of , it receives valid CMLs and this will satisfy ’s requirement until the next sampling cycle.
During each sampling cycle, each committee obtains CMLs. They are apportioned amongst the members such that each member gets samples. Moreover, a set of samples are set aside for the up to new nodes that may become members of .
** Invariant 2: Random Committee Membership.**
Every node is in a committee chosen uniformly and independently at random. Recall that receives a CML pertaining to a random committee in its first round. It then moves to in its second round. Subsequently, every few sampling cycles, it moves to a new (uniformly and independently chosen) random committee.
Each new node that enters the network will in 2 rounds (whp) become a member of some committee. Recall that the new node will be seeded with the ID of an arbitrary pre-existing node in the network. In the first (handshake) round, will request and obtain pertaining to a random committee from the node that it connected to (see Section 2 for more detail). In the second round, it will connect to using .
The node will then operate in cycles of rounds (on expectation) by moving to a new random committee every few (constant number of) sampling cycles. This process ensures that becomes a member of a random committee periodically and also ensures that the committees are periodically refreshed. This ensures that the -late adversary is incapable of inferring the members within committees.
Analysis Overview. We now wish to show that the maintenance procedure described above will work (i.e., invariants maintained) and that Spartan will remain robust for some sufficiently large rounds.
We begin by recalling a well-known claim with regards to the balls-into-bins model (see [20, 39]).
Claim 11**.**
Consider bins. For any fixed , there exists a constant such that when balls are thrown uniformly and independently at random into bins, then, every bin will contain balls with probability at least .
In our context, the committees are the bins and nodes are the balls and the proof works by showing that every committee will contain nodes for a sufficiently long time.
Formally, our analysis works by induction. Let us use to index the sampling cycles sequentially. The first sampling cycle starts right after the bootstrap phase, so clearly Spartan is robust until the start of the first sampling cycle. Moreover, by the construction of Spartan, each node is in a committee that was chosen uniformly and independently at random. Thus, by immediate application of Claim 11, Spartan is robust at the start of sampling cycle 1. This establishes our basis.
Lemma 12**.**
If Spartan is robust at the start of the th sampling cycle, then, it will (whp) remain robust until the end of the cycle (equivalently, the start of th cycle). Moreover, each node is placed in a committee that is chosen uniformly at random and independent of all other nodes’ committee assignments.
Proof.
First, we note that within sampling cycle , no committee will become empty. Since the cycle duration is rounds, at most nodes would have churned out. We must importantly note that the -late adversary will be unaware of where the nodes are, so the choice of nodes that are churned out are essentially uniformly at random. To formalize this, we can use the principle of deferred decisions. For any choice of nodes, we can defer the choice of their committee to the round when they are churned out. Consequently, the remaining nodes (of cardinality at least ) is also a uniformly random subset and sufficiently large, thereby allowing us to apply Claim 11. We can in fact claim that every committee will have at least nodes (whp).
Secondly, every new node that entered the network during sampling cycle was placed in a random committee. Since the number of such new nodes is at most , again, by Claim 11, we get the number of nodes within each committee to be whp. ∎
It is largely clear that all the rules of the model are followed during our construction. We now point out a few details that may not be immediately obvious. Firstly, each node forms overlay edges with all members of its current committee as well the members of the three neighboring committees. No other overlay links are formed. This means that no node violates the degree restriction. From the description, it is clear that we are never sending more than messages of size at most bits each. In addition, no node receives more than messages either (whp) and this is clarified in several places. The general principle is that messages are either sent across the overlay links (thus obviously ensuring that a node can receive at most of such overlay messages) or messages are sent to random nodes via random sampling. Such messages sent to random nodes – by applying the balls-into-bins claim (see Claim 11) – cannot disproportionately land on any one node. Thus, we can conclude with the following theorem.
Theorem 13**.**
Despite heavy churn at the rate of controlled by a -late adversary whose actions are instantaneous, we can, whp, bootstrap and maintain the Spartan overlay network without violating the rules of the model laid out in Section 2. Moreover, the Spartan network so constructed and maintained has the following properties.
Spartan provides addressable locations called committees arranged in a wrapped butterfly network. 2. 2.
The diameter is allowing any node in the network to reach another node in rounds. 3. 3.
Spartan is guaranteed to be robust for a time period that is an arbitrarily large .
6 Experimental analysis
While our theoretical results are largely insightful, for the sake of simplicity, we have not worked out all the constants. However, using the correct numbers is important when making key design choices like the size of the butterfly network (i.e., the number of committees N) required for robustness. Put another way, for a given number of committees , what would be the minimum number of nodes that we need in the network in order to ensure that all committees are populated with member nodes for a sufficiently long period of time? Essentially, we have to ensure that the committees are, on average, assigned sufficiently many peers. The theory we have developed indicates peers per committee on average, which, assuming a conservative estimate of 10 for the constant and being 1024 (again, quite conservatively), will be nodes per committee. We believe this is a significant overestimate; smaller committees should suffice. To obtain a more realistic estimate of the average number of nodes per committee, we performed some simple experimental analysis.
We consider peers assigned randomly to committees at round 1 to simulate the idea of peers being distributed among committees. We define a committee to be robust up to round if it has been non-empty until round . A bad event in this setting is when at least one committee becomes empty. To simulate churn, in every round after , peers (chosen UAR) are removed and are placed randomly into the committees. We repeat the above procedure until we either reach a previously defined stopping point of rounds (we use rounds) or until we reach a round in which at least one committee is empty. If all committees remain robust for rounds, we declare that the repetition was successful.
We considered committees (varying in the range ) and . For each value of , we ran repetitions of the experiment. For a given , we define the threshold to be the minimum number of peers such that all committees are robust through at least 90% of all repetitions (i.e., at least 27 repetitions). Then from Table 1, we see that the number of successful repetitions drastically decreases even for . Thus, we need to be careful in choosing the number of committees.
The analysis also allows one to discover the value of the threshold for different values of . Let be the multiplicative factor such that threshold for various values of ; this ensures that the average number of peers per committee is . From Figure 8 we observe that as increases, also increases, which is of course to be expected. However, one can immediately notice that the factor is in general much smaller than the size of the committees required for our theoretical analysis to go through.
7 Discussion and Concluding Remarks
We have presented Spartan, a sparse overlay network that is robust against heavy churn designed by an adversary that is aware of the network except for the most recent rounds. The overlay provides us with addressable committees in which data items can be stored and maintained in a robust manner. Furthermore, each committee is easily accessible to all other committees (within hops) because they are arranged in the form of a wrapped butterfly network.
We believe that this basic framework is quite flexible and can be adapted and extended in a variety of interesting ways to tackle some nagging issues in overlay network design.
For a start, in our model, we have assumed that the number of nodes must lie in for some . This is actually not a strict requirement. There are established techniques to count the number of nodes in the network [13, 14]. When the count gets too close to either end of the range , we can appropriately halve or double it as required. This would mean that other parameters like and (the number of columns) must also be recalculated.
We also have the flexibility to string the committees together to form other structures like hypercubes, expander graph or de Bruijn graph. Apart from using just a single type of graph, we could also design overlays that combine for example both a butterfly and a de Bruijn graph. Since both are low degree graphs, we can inherit the benefits of both families of graphs without affecting our results.
In our model, we have to do a lot of work in order to keep the network “warmed up" for all possible adversarial strategies. However, churn characteristics have been well studied and in particular, it is well known that despite heavy churn there will be a significant fraction of stable users who churn at a much lower rate [5]. We believe that the model could be enhanced to include a more careful accounting of the amount of work that is performed overall, which could in turn lead to more careful algorithm design that is more work efficient. In particular, in this context, it will be interesting to see if recent ideas of resource competitiveness surveyed by Bender et al. [41] in which resource expenditures are weighed against the adversary’s efforts could lead to more nuanced algorithms more amenable to real-world implementation.
Finally, Spartan is built with a wide range of applications in mind. It is a natural candidate to implement distributed hash tables capable of storing key, value pairs. The keys can easily be hashed into the space of committee identifiers. It will also be interesting to see if erasure codes can be used effectively in tandem with the idea of committees. Another feature is that Spartan is naturally resistant to attacks like censorship and denial-of-service. If a subset of the nodes is either censored or bombarded by denial-of-service attacks, the rest of the nodes will still be able to keep the committees operating.
For future work, we wish to explore the possibility of reducing the degree down to a constant. This seems plausible given recent works like [15] where constant degree expanders were maintained despite churn, but it is not clear as yet whether a constant degree addressable and routable P2P network is possible.
Acknowledgment
The authors would like to thank Chetan Gupta, Christian Scheideler, and Eli Upfal for useful discussions and ideas. We thank the anonymous reviewers for useful suggestions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. K. Sitaraman, M. Kasbekar, W. Lichtenstein, and M. Jain. Overlay networks: An akamai perspective. In Advanced Content Delivery, Streaming, and Cloud Services , pages 305–328. John Wiley and Sons, Inc., 2014.
- 2[2] Jarret Falkner, Michael Piatek, John P. John, Arvind Krishnamurthy, and Thomas Anderson. Profiling a million user DHT. In Proceedings of the Seventh ACM SIGCOMM Conference on Internet Measurement , IMC ’07, pages 129–134. ACM, 2007.
- 3[3] P Krishna Gummadi, Stefan Saroiu, and Steven D Gribble. A measurement study of napster and gnutella as examples of peer-to-peer file sharing systems. ACM SIGCOMM Computer Communication Review , 32(1):82–82, 2002.
- 4[4] Ion Stoica, Robert Morris, David Karger, M Frans Kaashoek, and Hari Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. Computer Communication Review , 31(4):149–160, 2001.
- 5[5] Daniel Stutzbach and Reza Rejaie. Understanding churn in peer-to-peer networks. In Proceedings of the Sixth ACM SIGCOMM Conference on Internet Measurement , IMC’06, pages 189–202. ACM, 2006.
- 6[6] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Shenker. A scalable content-addressable network. Computer Communication Review , 31(4):161–172, 2001.
- 7[7] Antony Rowstron and Peter Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. Lecture Notes in Computer Science , 2218:329–350, 2001.
- 8[8] Ben Y. Zhao, John Kubiatowicz, and Anthony D. Joseph. Tapestry: a fault-tolerant wide-area application infrastructure. Computer Communication Review , 32(1):81, 2002.
