Fast Uniform Dispersion of a Crash-prone Swarm
Michael Amir, Alfred M. Bruckstein

TL;DR
This paper presents a model for a swarm of autonomous robots that can quickly and reliably cover an unknown environment despite frequent crashes and limited space, using simple local rules and a process inspired by statistical mechanics.
Contribution
It introduces a new model for robot swarm coverage that accounts for crashes and asynchronicity, demonstrating linear-time completion under these challenging conditions.
Findings
Robots complete coverage in linear time asymptotically almost surely.
Coverage time degrades gracefully with crash frequency.
The model is based on the totally asymmetric simple exclusion process.
Abstract
We consider the problem of completely covering an unknown discrete environment with a swarm of asynchronous, frequently-crashing autonomous mobile robots. We represent the environment by a discrete graph, and task the robots with occupying every vertex and with constructing an implicit distributed spanning tree of the graph. The robotic agents activate independently at random exponential waiting times of mean and enter the graph environment over time from a source location. They grow the environment's coverage by 'settling' at empty locations and aiding other robots' navigation from these locations. The robots are identical and make decisions driven by the same simple and local rule of behaviour. The local rule is based only on the presence of neighbouring robots, and on whether a settled robot points to the current location. Whenever a robot moves, it may crash and disappear from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Fast Uniform Dispersion of a Crash-prone Swarm
Michael Amir
Technion - Israel Institute of Technology
Email: [email protected]
Alfred M. Bruckstein
Technion - Israel Institute of Technology
Email: [email protected]
Abstract
We consider the problem of completely covering an unknown discrete environment with a swarm of asynchronous, frequently-crashing autonomous mobile robots. We represent the environment by a discrete graph, and task the robots with occupying every vertex and with constructing an implicit distributed spanning tree of the graph. The robotic agents activate independently at random exponential waiting times of mean and enter the graph environment over time from a source location. They grow the environment’s coverage by ‘settling’ at empty locations and aiding other robots’ navigation from these locations. The robots are identical and make decisions driven by the same simple and local rule of behaviour. The local rule is based only on the presence of neighbouring robots, and on whether a settled robot points to the current location. Whenever a robot moves, it may crash and disappear from the environment. Each vertex in the environment has limited physical space, so robots frequently obstruct each other.
Our goal is to show that even under conditions of asynchronicity, frequent crashing, and limited physical space, the simple mobile robots complete their mission in linear time asymptotically almost surely, and time to completion degrades gracefully with the frequency of the crashes. Our model and analysis are based on the well-studied “totally asymmetric simple exclusion process” in statistical mechanics.
I Introduction
In swarm robotics, a vast number of autonomous mobile robots cooperate to achieve complex goals [26]. Individual members of the swarm are usually assumed to be simple, expendable, and computationally limited, and to move and act according to online local rules of behaviour. In this work, our goal is to formally study the ability of a simple, “two-layered” swarm-robotic system to complete an environmental coverage task called uniform dispersal assuming asynchronicity and that robots may crash whenever they attempt to move.
“Coverage” algorithms that enable a single- or multi-robot system to cover or explore unknown or dynamically uncertain environments are an important topic in mobile robotics. There has been great interest in applications to, for example, mapping [6, 25, 19], servicing and surveillance [16], or search and rescue operations [28, 15, 11, 31], and a rich body of theoretical work exists (we refer the reader to the surveys [5, 23]). A natural coverage problem for robotic swarms is the uniform dispersal problem, introduced in [26]. In uniform dispersal, many robotic agents enter an unknown discrete graph environment over time via one or several source locations and are tasked to eventually occupy every vertex of the graph with a robot while avoiding collisions.
Swarms are often claimed to be highly fault-tolerant, as redundancy and sheer numbers can enable the swarm to go on with its mission even if many robots malfunction [39]. However, as the size of a robotic fleet grows, so too does the opportunity for error. Specifically, three different complications that arise in multi-robot systems are further exacerbated in the swarm setting:
Asynchronicity. As the number of robots grows, coordinating the robots’ actions becomes a formidable task, as their actions and internal clocks can become highly unsynchronized.
Crashes. We cannot expect to release a huge swarm of simple robots to an unknown environment without the occurrence of hardware or software faults that may cause robots to crash.
Traffic. To avoid collisions, we do not wish for there to be too many robots crowding a given area, and so mobile robots should maintain safe distances from each other. In restricted physical environments, such requirements cause traffic delays, as robots must wait for other robots to move away before entering a target location.
Such challenges are discussed as a central direction of research for swarm robotics in [33]. If the number of errors scales with the number of robots, are swarms “worth the trouble”? The purpose of this work is to give a perspective on this question via a formal mathematical analysis. We study, in an abstract setting, the ability of a simple local rule to achieve uniform dispersal in the presence of crashes and asynchronicity. We are specifically interested in how the frequency of crashes affects the time to mission completion.
We first describe a “two-layered” rule of behaviour for swarms that is capable of achieving uniform dispersal. Using this algorithm, we show that a swarm can complete its mission quickly and reliably in unknown discrete environments, even in the presence of asynchronicity and frequent crashes. Hence, we claim that in our setting, many robots can win against many errors. In the spirit of swarm robotics, the algorithm relies only on local information to dictate robots’ actions and is quite simple. This simplicity makes it amenable to analysis.
Our swarm consists of a large reservoir of simple, anonymous, identical, and autonomous mobile robots that enter the environment over time via a source location . The robots move across a discrete environment represented by an a priori unknown connected graph whose vertices represent spatial locations. The robots gradually expand their coverage of the environment by occupying certain locations and assisting other nearby robots in navigational tasks using a local, indirect communication scheme.
The swarm’s robots switch between two modes: mobile and settled. The settled robots act as ‘nodes’ of the current coverage of the graph environment, and the mobile robots move between locations with settled robots until they can find a new location where they themselves can settle. The settled ‘robot nodes’ are capable of pointing to (“marking”) a single neighbouring location where there is another settled robot. “Marking” is understood to be a generic capability of the robots and could be accomplished by many different technologies, such as local radio communication or visual sensing; we refer to the Related Work section for possible implementations.
As more and more mobile robots become settled, their marks serve as a navigational network of the environment that is utilised by the remaining mobile robots. The mobile robots are capable of sensing the number of robots in neighbour locations, and sensing when a settled robot is pointing to (marking) their location. They rely only on this information to make decisions. Hence, they operate in a GPS-denied, low memory setting, meaning they act based only on local communications and local geographic features. The robots are tasked with settling at every vertex of , and constructing an implicit spanning tree of via the settled robots and their pointer marks.
There are no restrictions on as long as it is connected. In principle, different robots need not even agree on the graph representation of their environment for our algorithm to work (e.g., in case they gradually build it from local sensory data), as the settled robots gradually construct a spanning tree which all robots agree on and use to move between locations. We assume, for simplicity, that they share the same representation.
Physical constraints and asynchronicity. We model the mobile robots as activating repeatedly at stochastic, independent exponential waiting times of rate . When a robot activates, it may move or move-and-settle at a nearby location (once a robot settles, it remains stationary). We assume the physical constraint that any given location may contain no more than a single mobile robot and a single settled robot (and perhaps a number of crashed robots). Frequent traffic obstructions occur as robots block each other off from progressing.
This model of asynchronicity and limited vertex capacity in a graph environment is motivated by the totally asymmetric simple exclusion process (TASEP) in statistical mechanics. There is an extensive literature on this process as a model for a great variety of transport phenomena, such as traffic flow [18] and biological transport [17]. Rigorous exact and asymptotic results for TASEP are known [27, 38], and our analysis technique shall be to compare our swarm’s performance to a two-layered TASEP-like process. Since our robots are mostly in a state of ”traffic flow” (waiting for other robots to move), references such as [18] suggest that our model in fact captures many of the relevant traffic phenomena that will occur in real life implementations.
Adversarial crashing. Similar to, e.g., [28], we consider a risky traversal model where robots may crash whenever they try to move across an edge. We assume robots remain safe when not moving, as remaining put is less risky than travelling (in fact, we need just the weaker assumption that settled robots, which never move, are safe). To facilitate analysis, we assume crashed robots do not prevent travel between the graph’s vertex hubs. This assumption is applicable when such robots can be manoeuvred around or pushed aside, or the crash causes the robot to disappear. For example, we may consider crashed air-based robots falling to the ground during exploration of an environment. Alternatively, in a ground robot scenario, we can with foresight expand vertex sizes to be big enough such that vertices can contain a small number of crashed robots in addition to the two active robots (and such local crashed robots are then bypassed using, e.g., local collision avoidance).
Since, in our model, at most one new robot may arrive in the environment per time step, we assume that the number of crashes that occur is bounded by the current time , and a parameter which reflects the frequency at which crashes occur over time. When is close to , the vast majority of robots that enter the environment will crash before achieving anything.
Besides these limitations, we assume nothing more about the crashes that occur. In particular, a virtual adversary may choose crashes so as to be as obstructing as possible.
Results. We describe a local rule of behaviour (Algorithm 1) that can achieve uniform dispersion, even in the presence frequent crashes and traffic obstructions. The rule is easy to understand and implement and is well-suited for a swarm of simple robots, mimicking a kind of branching depth-first search. In many mobile robot systems one wishes to construct a spanning tree of the environment for purposes of mapping, routing or broadcasting [1, 4, 14, 21, 22]. Our rule achieves this as well, by having robots act as nodes of the tree, and making them aware of their immediate descendants. Our goal is to study how crashes, asynchronicity, and traffic affect the swarm’s performance under this rule of behaviour.
We prove that our robots are able to complete their mission in time linear in the size of the environment, and that performance degrades gracefully (by a factor ) with frequency of crashes. Given our assumptions and algorithm, it is not surprising that the robots can complete the dispersion assuming some crashes; rather, we show that even with many frequent crashes, the robots can still do so efficiently.
Specifically, let be the number of vertices in the environment . We prove that dispersal completes before time 8\cdot\big{(}(1-c)^{-1}+o(1)\big{)}n asymptotically almost surely (meaning with probability approaching as grows)–a worst-case bound on performance. No dispersal algorithm can complete in less than expected time, since this is the time it takes to even explore vertices, so when there are no crashes (but still there is traffic and asynchronicity) this bound is asymptotically tight. For, say, , we expect up to (roughly) % of robots to crash before achieving anything, and our analysis says that therefore the swarm will take twice as long to achieve dispersal. This seems intuitive, but consider that the robots that eventually crash are (uselessly) present in the environment in the time leading to the crash, blocking other robots from entering or progressing. The analysis says that nevertheless, the ability of the rest of the swarm to achieve its goal is not disproportionately worsened.
To the best of our knowledge, with or without crashes, we are the first to consider a non-synchronous setting for the uniform dispersal problem where time to completion can explicitly be bounded, hence also the first to give explicit performance guarantees in a non-synchronous setting. In an asynchronous as opposed to a synchronous setting, there are many more possible configurations that the robots might exist in, which makes the analysis more difficult. We believe the references and techniques from statistics [27, 38, 18] might be of general interest for tackling these kinds of topics.
Our analysis extends also to a synchronous time setting, and to the case where robots enter the environment from multiple locations. Multiple entrance locations result instead in the robots constructing instead an implicit spanning forest. In both these settings, dispersion completes faster. The bound on performance we derive for the synchronous case is exact.
Finally, we confirm our findings by numerically simulating our system in a number of environments and measuring performance.
I-A Related work
Uniform dispersal was introduced by Hsiang et al. in [26] for discrete grid environments of connected pixels (but their work can be extended to arbitrary graph environments). They considered a synchronous time setting where robots are allowed to send short messages to nearby robots, and showed time-optimal algorithms for this setting. Many variations have since been studied. Barrameda et al. extend the problem to the asynchronous setting with no explicit non-visual communication [10, 9]. Recent works include dispersal with weakened sensing [24], dispersal in arbitrary graph environments [30], and dispersal under energy constraints [7]. Our model differs from previous work on several central points, including the presence of crashes, the two layers, and the ability to mark neighbours. Marking is weaker than the radio communication available to robots in [26], that enables robots to transfer many bits of data locally, but stronger than the indirect, visual communication assumed in several other works.
Because of differences in the settings, assumptions, and constraints, quantitative comparison of works on uniform dispersal is very difficult. Table I gives a rough, non-exhaustive overview of some differences, such as the supported kinds of environments (grid environment, hole-less grid environment, or arbitrary graph environment), synchronous versus asynchronous time, expected makespan (i.e., how long it takes the robots to complete their mission), and whether crashes are considered in the model.
Robotic coverage, patrolling, and exploration with adversarial interference, as well as crashes, have been studied in different problem settings from our own. Agmon and Peleg studied a gathering problem for robots where a single robot may crash [3], and gathering with multiple crashes was later discussed by Zohir et al. in a similar setting [13]. Robotic exploration in an environment containing threats has been studied in [40, 41]. Moreover, adversarial crashes of processes are often studied in general distributed algorithms (e.g., [20]). Differing from many of these works, we study a situation where the number of crashes scales with the mission’s complexity (the time it takes to cover the environment), and where even the vast majority of robots may crash. However, to enable this, we assume access to a huge reservoir of robots waiting to replace crashed robots–i.e., a robotic swarm.
Robotic coverage in various hazardous or adversarial GPS-denied settings has become an important topic in recent decades, since this opens the possibility of deploying robotic swarms in the real world, outside laboratory conditions [23, 5, 2]. Theoretical and empirical results about the performance of swarms in such settings may help inform our expectations of real world swarm-robotic fleets. To implement such systems in practice, the robots themselves must be capable of relative visual localization. This poses a technical challenge, as considerations of depth, angle of view, and persistent coverage come into play. In [12] a system of relative visual localization for mobile ground vehicles with low computing power is proposed. The system enables autonomous ground vehicles to navigate their environment while avoiding obstacles. In [36] a relative visual localization technique is developed for small quadcopters, with similar capabilities. In [34] the authors discuss a localization algorithm for lightweight asynchronous multi-robot systems with lossy communication. These are examples of the techniques that may be used for the sensors of robots in such systems as the one described in this paper (see similar discussion in [7]).
A fascinating introduction to TASEP-like processes and their connection to other fields is [29].
II Model and System
We consider a swarm of mobile robotic agents performing world-embedded calculations on an unknown discrete environment represented by a connected graph . The vertices of represent spatial locations, and the edges represent connections between these locations, such that the existence of an edge indicates that a robot may move from to .
We assume an infinite collection of robots (also referred to as ‘agents’) attempt to enter over time through a source vertex . The robots are identical and execute the same algorithm. They begin in the mobile state, and eventually enter the settled state. Settled robots are stationary, and are capable of marking a neighbouring vertex that contains another settled robot. Mobile robots move between the vertices of and sometimes crash while in motion. They are oblivious, and decide where to move based only on local information provided by their sensors: the number of robots at neighbouring vertices, and whether any of the neighbouring settled robots mark their current location. Each vertex has limited capacity: it can contain at most one settled and one mobile robot.
Mobile robots are only allowed to move to a neighbouring vertex when they are activated. Each robot, including robots outside , reactivates infinitely often and independent of other robots, at random exponential waiting times of mean .
When contains less than two robots, robots from outside attempt to enter it when they are activated. It is convenient to give the robots arbitrary labels and assume that cannot enter before all robots with lower indices entered or crashed. This assumption makes the analysis simpler, but the performance bound we prove in this work holds also for the entrance model where robot entrance depends only on which robot is activated first. Hence, whenever the current lowest-index robot outside of activates and there is no mobile robot at , it moves to . If is completely empty, the robot settles upon arrival and becomes the root of the spanning tree. Otherwise it remains a mobile robot.
We denote by the graph whose vertices are vertices of containing settled robots at time , and there is a directed edge if is marked by a settled robot at . The goal of the robots is to reach a time wherein is a spanning tree of the entire environment . The makespan of an algorithm is the first time when this occurs.
Crashes are modelled as follows: when a robot is activated and attempts to enter or move from to via the edge , occasionally an adversarial event occurs, causing the deletion of from . Robots do not crash unless attempting to move. Hence, mobile robots are volatile but settled robots are safe. This assumption is somewhat stronger than necessary: our results still hold if mobile (but not settled) robots are allowed to crash while they stay put, but this tediously lengthens the analysis. We assume the number of adversarial events before time is bounded by a fraction of . Adversarial events may otherwise be as inconvenient as possible: we may assume there is an adversary choosing crashes to maximize the makespan of our algorithm.
Unless stated otherwise, when discussing the configuration of robots “at time ”, we always refer to the configuration before any activation at time has occurred.
III Dispersal and Spanning Trees
We study a simple local behaviour (Algorithm 1) that disperses robots and incrementally constructs a distributed spanning tree of . The rule determines the behaviour of mobile robots whenever they are activated (settled robots merely remain in place and continue to mark their target). We prove that using this rule, the makespan is linear in the number of vertices of asymptotically almost surely, and that performance degrades gracefully with the density of crashes.
The rule grows as a partial spanning tree of . It acts as a kind of depth first search that splits into parallel processes whenever a mobile robot is blocked by another mobile robot. Every vertex of the tree is marked by settled robots at its descendants. Mobile robots follow these marks to discover the leaves of the current tree and expand it. Robots grow the tree by settling at unexplored vertices that then become new leaves. Our main result is Theorem III.1:
Theorem III.1**.**
If for all the number of adversarial events before time is allowed to be at most , , then the makespan of Algorithm 1 over graph environments with vertices is at most asymptotically almost surely as .
Figure 1 shows an execution of our algorithm on a grid environment with square vertices (white region) and obstacles (blue region). We allowed a naive adversary to arbitrarily delete at most robots before time , with . This corresponded to a deletion of 56% of robots that entered the environment before the makespan. In a more constrained topology (such as a path graph, see Section III-A3), the robots would progress more slowly, and a greater percentage would be deleted. The makespan (bottom right figure) was , consistent with the upper bound of Theorem III.1. After the the spanning tree completes, robots keep entering the region until there are two robots at every vertex. This is related to the “slow makespan”, which we will later define. The slow makespan was 831. See Section IV for more simulations.
III-A Analysis
We study the makespan of Algorithm 1. Some of the proofs are placed in the Appendix.
For the analysis, we will assume that robots from that settle or crash keep being activated. This is a purely “virtual” activation: such robots of course do and affect nothing upon being activated. We start with a structural Lemma:
Lemma III.2**.**
* is a tree at all times with probability .*
Proof.
When the first robot enters and successfully settles, contains only . No settled robots are ever deleted, so can only gain new vertices. Whenever a mobile robot settles, it extends the tree by one vertex, connecting its current location to via a single directed edge. By definition, the edge is directed from the vertex the settled robot marks–which is its previous location–to . This turns into a leaf of . With probability no two robots on activate at the exact same time, so no two robots settle the same vertex. Hence remains a tree. ∎
III-A1 Event orders
We explain how we intend to bound the makespan. Our strategy shall be to use coupling to compare the performance of Algorithm 1 by the performance of different random processes of robots moving on different structures. Coupling is a technique in probability theory for comparing different random processes (see [32]).
The basic idea is this: whenever we run Algorithm 1 on , we can log the exact times at which the robots activate, as well as the times adversarial events happen and which robots they affect. This gives us an order of events sampled from some random distribution. Note that robots keep activating forever (but these activations do nothing once the graph is full), so is infinitely long. We then “re-enact” or “simulate” on a new environment (or several new environments) involving the robots by activating and deleting the robots according to .
To make things more precise, by “simulating” on different environments we mean that we consider the coupled process wherein different environments have robots that are paired such that whenever in is scheduled for an activation or a deletion according to the event order ( is simply an infinite list of scheduled activation and deletion times), the copies of in all the environments also activate or are deleted. When the copies of are activated they act according to Algorithm 1 with respect to their local neighborhood. Robots entrances are modelled as usual (Section II), but note that even if manages to enter following an activation, its copy might not enter its own environment because in that environment the entrance is blocked, or there is a lower-index robot waiting to enter. During Algorithm 1’s analysis, we will often be talking about a deterministic event order being simulated over different environments. The end-goal, however, is to say something about the event order when it is randomly sampled from the execution of Algorithm 1 on .
The event order must be a possible set of events that occurred during an execution of our algorithm on the base graph environment . This means, due to our model, that a robot in will never be scheduled for deletion except at times when it is activated and attempts to move. However, while simulating on the environments , we must be allowed to break the rules of the model: we might delete robots even when they don’t attempt to move, or while they are outside of the new graph environment. Whenever we say “for any event order ”, we mean event orders that could have happened over .
In , define to be the first time activates, to be the first time after that either or activate, and to be the first time that any robot in the set is activated.
Definition III.3**.**
The times in are called the meaningful event times of .
For meaningful event times to be well-defined there must be a minimal time where one of the robots activates. Because the activation times of the robots are independent exponential waiting times of mean , this is true with probability for a randomly sampled . Moreover, with probability , at any time there is precisely one robot of scheduled for activation by . Because both these things are true with probability , we assume they are true for any event order referred to at any point in this analysis. This does not affect our main result (Theorem III.1), which is probabilistic.
Our end-goal is randomly sample from and simulate it on four increasingly “slower” environments: , , , , so that all environments ( and these four) are coupled. Meaningful event times are so called because, prior to the first activation of , any of the robots cannot enter or move in any of these environments, and activating them causes nothing. Hence, at any time which is not a meaningful event time, the configuration of robots cannot change (no robots move and no robots are deleted in any of the environments is simulated on).
The possibility to create an event order is the only reason we labelled the robots and made the assumption about entrance orders in Section II.
III-A2 versus
Let be the number of vertices of . The path graph over vertices is a graph over the vertices such that there is an edge for all . We simulate on the graph environment where the source vertex is . Simulating on results in what is mostly a normal-looking execution of Algorithm 1 on , but as discussed, it might lead to some oddities such as robots being deleted while they are still outside the graph environment.
Let us introduce some notation. refers to the copy of being simulated by on , and is similarly defined.
Definition III.4**.**
The depth of at time , written , is the number of times has successfully moved before time . Depth is initially [math]. Entering at is considered a movement, so robots entering have depth .
is similarly defined with respect to .
Definition III.5**.**
Let be a tree graph environment (such as ) with source vertex . A vertex of becomes slow at time if a mobile robot on was activated and found no vertex it could move to, and also, either is a leaf of or all of its descendants in are slow at time .
A robot is slow at time if it is located at a slow vertex at time .
Definition III.6**.**
The slow makespan of on , , is the first time all vertices of are slow when simulating the event order .
is not always a tree, but given a fixed event order , we can associate to a spanning tree of , , containing as a subtree for all times . Lemma III.2 says robots only use edges of , so we may define the slow makespan of on the -simulation as the slow makespan on . Slow makespan is clearly also defined for the -simulation. Furthermore, is an upper bound on the (regular) makespan of the -simulation, since every vertex must have a settled robot before it becomes slow and, as the settled robots of never move, they cannot be deleted by .
Our motivation for introducing slow makespan is that we wish to show is the environment that maximizes slow makespan on vertices. However, it does not maximize normal makespan (see Table II for an example).
Lemma III.7**.**
A slow robot is forever unable to move and never deleted in the event order .
Proof.
Only robots attempting to move can be deleted. If is at a leaf of , it can never move, since its parent vertex in contains a settled robot marking the vertex of a robot in a different location, and settled robots are never deleted. Hence, is never deleted. Slow vertices propagate upwards from the leaves of , so the statement of the lemma follows by induction. ∎
Proposition III.8**.**
For any event order , .
An intuitive argument for this proposition is that if the spanning tree of is not , then some vertex of must have multiple descendants, hence robots entering will be able to branch to different neighbours and is less likely to be blocked. Consequently, robots will enter faster than , and so . We need to formalize this intuition into an argument that holds for any event order . It turns out there are many subtleties involving asynchronicity, settling and crashing which make this not straightforward, and we require a rather technical argument. (Such subtleties are also why it is simpler to compare the environments , , rather than compare to directly.)
We prove Proposition III.8 by induction on the meaningful event times in the event order . We show the following statements to be true for non-deleted robots at all times :
- (a)
If is not slow or settled, then . 2. (b)
If is slow or settled, then is slow or settled, and .
We note that both statements are (trivially) true at time , as no event has occurred yet.
Lemma III.9**.**
If statement (b) is true up to time , settled and slow robots of neither move nor get deleted as a result of an event of scheduled for time (i.e., the robots still exist and are in the same place at time ).
Assuming (a) and (b) hold at all times, let us see how to infer Proposition III.8. If a vertex becomes slow at some time , it must contain a settled and a mobile robot, both of whom become slow. Lemma III.9 says that slow and settled robots of never get deleted. Hence, the first time there are slow robots on (two at every vertex) is . Statement (b) implies that if has slow robots, must also contain slow or settled robots. It is immediate to verify that this can only happen when has slow robots. Hence, at time , has slow robots–two at every vertex. The inequality follows by definition. ∎
Lemma III.10**.**
If statements (a) and (b) are true up to time , statement (a) is true at time .
Lemma III.11**.**
If statements (a) and (b) are true up to time , statement (b) is true at time .
III-A3 versus
We wish to bound (which is determined by the event order ). We do this by comparing simulations of on different environments. To start, let be the path graph with infinite vertices, and where . We may simulate on as we did on .
Lemma III.12**.**
For any event order simulated on and and any time , and contain the exact same number of robots.
Proof.
The configuration of robots in the first vertices of and is identical until becomes slow in . After becomes slow, the configuration of robots in the first vertices is still the same in both graphs until a robot in is prevented from moving by a robot in , meaning becomes slow. By induction, the configuration of robots in the first vertices of both graphs is identical until in becomes slow (we use Lemma III.9 to infer that the slow robots at are never deleted). Hence, until becomes slow, robots enter at the same times in and . becomes slow precisely at time . ∎
III-A4 versus
We simulate on the environment . is with the modification that there is at time a settled robot at every vertex . The settled robot at marks . These “dummy” robots are never activated, and are not of the indexed robots . Because there is already a settled robot at every vertex, the robots never become settled. Call this environment . Lemma III.13 shows is strictly slower than :
Lemma III.13**.**
For any event order and at any time , the amount of mobile-state robots in at time is at most the total amount of robots in .
III-A5 versus totally asymmetric simple exclusion
We bound the arrival rate of robots at by another, even slower process. This process, , takes place on the path graph where we also have non-positive vertices , and such that there is an edge for every . Like there is initially a settled robot at every vertex, marking the vertex before it. Unlike the other processes, robots do not enter at : the robot begins inside the graph environment as a mobile robot located at . To compare with , we count the robots that cross the edge . There is one more crucial feature of : robots are never deleted from . Scheduled robot deletions at are treated as a regular activation of the robot. Besides these differences, can be simulated on as before.
Lemma III.14**.**
For any event order and at any time , the number of mobile robots that crossed the edge of is at most the number of robots that entered or were deleted before entering .
Recall that is an event order of some execution of Algorithm 1 on the graph environment of interest, . We may randomly sample by running Algorithm 1 on and logging the events.
The stochastic process resulting from simulating a randomly sampled event order on is called a totally asymmetric simple exclusion process (TASEP) with step initial condition, first introduced in [37]. In this process, robots (called also “particles”) are activated at exponential rate and attempt to move rightward whenever no other robot blocks their path. This is precisely the outcome of simulating on (since robot activations that lead to a deletion in the other processes are treated as a regular activation in ).
In TASEP with step initial condition, let us write to denote the number of robots that have crossed at time . It is shown in [35] that converges to asymptotically almost surely (i.e., with probability 1 as ). [27] shows that the deviations are of order . Specifically we have in the limit:
[TABLE]
Valid for all , where is the Tracy-Widom distribution and obeys the asymptotics and as . We employ Equation 1 and the prior analysis to prove Theorem III.1:
Proof.
Let be a graph environment with vertices. Let be the randomly sampled event order of an execution of Algorithm 1 on . We will bound the slow makespan, .
We simulate over the environments , , , and . From Lemma III.14 we know that at all times the number of robots that crossed the edge of , meaning , is less than the number of robots that entered or were deleted before entering. At most robots are deleted by time , so the number of mobile robots at at time is at least . Lemmas III.12 and III.13 imply this is at least the number of robots at at any time .
At any time , there cannot be more than robots at . Hence, if , then . By Proposition III.8, we shall then also have .
Write . We wish to show is an upper bound on asymptotically almost surely, which is precisely the statement of Theorem III.1. To show this, we are interested in , the probability that is less than at time . Showing tends to [math] as completes our proof. Define the probability
[TABLE]
is the parametrized left innermost part of Equation 1 with ( is a positive integer). Note that is monotonic increasing in . Define . By algebra, we have . Fix any constant and define . Again by algebra, tends to as . Hence, for a large , we must have and therefore (by the monotonicity of ). By Equation 1, tends to as . Hence is at most in the limit. By taking we see that in the limit is at most . ∎
We note that slow makespan can be nearly equal to makespan (see Table II, or consider a path graph the source vertex placed at and robots initially moving rightwards). Hence, one does not “miss out” on much by using it to bound makespan.
III-B Synchronous time and multiple sources
We describe extensions of our results to two settings.
Synchronous time. We may consider a synchronous time setting that is discretized to steps such that at every step, all the robots activate at once. In this setting, Algorithm 1 ends up exploring just one branch of the tree at a time, like depth-first-search; so no two robots ever attempt to enter the same vertex. Analysis similar to the asynchronous case shows that robots then enter at rate (instead of approximately ) on , and analogous reasoning to Lemma III.8 and Theorem III.1 gives an upper bound of on the makespan of a graph with vertices, assuming adversarial events. Consider the path graph with (not the usual ), and where the robots first fill the vertices with a double layer before reaching . The synchronous makespan of this environment is asymptotically . Hence, the bound on the makespan in the synchronous case is exact.
Multiple source vertices. Instead of just having a single source vertex , we may consider environments with multiple source vertices such that each of them corresponds to its own set of robots entering over time. In asynchronous time, Lemma III.2 can be generalized to show that is then a forest, and the robots attempt to create a spanning forest of . The technique in this paper can be generalized to show that the makespan bound of Theorem III.1 holds. In general graph environments multiple sources may not improve the makespan by much. For example, consider the path graph with sources on . The makespan of this graph is bounded below by the makespan of the path graph with a single source vertex .
IV Simulation and evaluation
For empirical confirmation of our analysis, we numerically simulated our algorithm on a number of environments. On these environments, we measured the makespan and the percentage of robots that crashed for the parameters , averaging them over 30 simulations per configuration and rounding to the nearest integer. Data on several environments is found in Table II. Figure 3 shows stills from some simulations.
From the data, it is clear that makespan is affected by the shape of the environment and by . We see that an increase in the percentage of robots crashed scales makespan up gracefully, and that spacious environments generally have lower makespans. We also confirm that the slow makespans are always lower than the bound of Theorem III.1. Closest to the bound is the scenario where the environment is the path graph and , in which case slow makespan is almost exactly the bound, . This is consistent with our analysis that the environment has the largest slow makespan. It also verifies that Theorem III.1 gives a correct upper bound. Such data further suggests that for spacious environments, and for large , performance on average is better than the worst-case performance guarantee of Theorem III.1. In the simulations, we did not choose our adversarial events to be maximally obstructing, but rather crashed robots arbitrarily–a cleverer adversary would cause the makespan and slow makespan to be closer to the worst-case (and cause a larger percentage of robots to crash).
V Discussion
In swarm robotics, where one must coordinate an enormous robotic fleet, we must anticipate many faults, such as crashing and traffic jams. Because robots in the swarm are usually assumed to be autonomous and have limited computational power, complex techniques for handling such faults are not necessarily feasible. Hence, it is important to ask whether simple rules of behaviour can be effective. To this end, we investigated the problem of covering an unknown graph environment, and constructing an implicit spanning tree, with a swarm of frequently crashing robots. We showed a simple and local rule of behaviour that enables the swarm to quickly and reliably finish this task in the presence of crashes. The swarm’s performance degrades gracefully as crash density increases.
We outline here several directions for future research. First, our model interprets the “swarm” part of swarm robotics as a vast and redundant fleet of robots that can be dispersed into the environment over time. We used this model for uniform dispersal, but it would be interesting to adapt it to other kinds of missions, and design algorithms for those missions that can handle crashes or other forms of interference. For example, in [8], mobile agents entering at a source node over time sequentially pursue each other to discover shortest paths between and some target node . The algorithm succeeds even if some of the agents are interrupted and have their location changed.
Next, in this work, we made the simplifying assumption that the environment of the robots is discrete. If the robots instead attempted to cover a continuous planar domain by an algorithm similar to ours, the robots would need to construct a shared discrete graph representation of the environment through the settled robots in and their markings. We believe that our algorithm can readily be extended to such settings.
Lastly, can we exploit the large number of robots in a swarm to handle other kinds of errors? There are many situations and modes of failure that can be discussed, such as Byzantine robotic agents, or dynamic changes to the environment.
VI Fast Uniform Dispersion of a Swarm - Supplementary Appendix
Reminder:
- (a)
If is not slow or settled at time , then . 2. (b)
If is slow or settled at time , then is slow or settled, and .
VI-A Proof of Lemma III.9
Proof.
Referring to the Lemma’s statement, we remind that here “time ” refers to the configuration of agents at time before any scheduled events. Hence, even if something is true at time , we still need to show that it remains true after the events that happen at time .
Let be slow or settled at time . To show will not be deleted, it suffices to show the event order will not delete . (b) implies is settled or slow at time . Lemma III.7 says never deletes slow agents. never deletes settled agents of as, in our model, agents are only deleted when they move, and obeys the rules of the model when simulated on . Hence, will not delete .
Next we show that will not move as a result of an event scheduled for time . If is settled, this is true by definition. Otherwise, is slow. By assumption, (b) is true at all times up to . Hence, by the same reasoning as the above paragraph, agents of that became slow or settled at or prior to time have not been deleted. Consequently, the argument of Lemma III.7 applies also here, allowing us to conclude that agents cannot move after they become slow. In particular this applies to . ∎
VI-B Proof of Lemma III.10
Proof.
Only one event occurs at time . This event is either an uninterrupted activation of an agent (meaning the agent is not deleted), or an activation that leads to a deletion. If the event is a deletion, (a) holds at time trivially, so we assume that it is an uninterrupted activation.
Let and be the agents that are activated at time . The depth of any other agent is unchanged, so we need only verify (a) for these two agents. Assuming (a) it true at time , it is only possible for (a) to become false at time if did not move, but did. We assume this is the case.
If does not move as a result of its activation at time , then either it is settled, in which case (a) is true and we are done, or there is a mobile agent at every neighbouring vertex in . If is mobile and all of its neighbours are slow at time , then becomes slow at time and (a) is true. Otherwise there is a mobile agent, , that is preventing from moving and is not slow. We must have that
[TABLE]
Because and are always moving down a spanning tree of , hence the depth of must be precisely one greater than ’s in order to prevent movement.
Because is not activated at time , (a) and (b) are still true for it at time . Because is not slow or settled, (a) implies that
[TABLE]
And the contrapositive of (b) implies that is not settled. However, consider the structure of the graph : if is mobile, then since it entered before , it must be further ahead. In particular, we must have
[TABLE]
As otherwise would have prevented from moving when activated at time .
(In)equalities 3, 4 and 5 imply . This shows (a) is true at time . ∎
VI-C Proof of Lemma III.11
Proof.
As in Lemma III.10, we can assume that the event at time is the uninterrupted activation of a pair of agents and , and we need only verify that (b) is still true for this pair of agents. We separate our proof into cases.
Case 1: Assume is settled at time . Because is a path graph and using Lemma III.9, can only be settled if every non-deleted agent that entered before it is settled behind it. At time (b) is still true for all agents other than and . Hence, it follows from (b) that for any non-deleted agent where we have:
[TABLE]
Algorithm 1 guarantees that any agent in always neighbours a settled agent or is at the same location as a settled agent. Thus, we know that for some settled agent . Furthermore, this inequality must hold for some that entered before (i.e., ), because any settled agent that entered after must have gone down a different branch of , otherwise it would be blocked by and unable to settle. Let . Then . If this is an equality, is necessarily settled.
From Inequality 6 we infer
[TABLE]
Where follows from the fact that is ahead of all non-deleted agents that came before it. In the case of equality, must be settled. If isn’t settled, then the inequality is strict. Consequently, it follows from the fact that (a) holds at time (Lemma III.10) that must be slow. Otherwise, (a) implies ’s depth is greater than ’s, contradicting the inequality. Either way, (b) is true.
Case 2: Assume is slow and not settled at time . If is slow at , then it follows from (b) that is slow or settled at , and so activation cannot affect either of these agents, meaning (b) remains true at and we are done. Thus, we may assume is not slow at time .
Using Lemma III.9, can only become slow at time if all vertices behind it contain settled agents, and all vertices ahead of it contain two slow agents (one settled and one mobile). If is there are slow or settled agents in at time . These agents must have entered before , because any agent that enters after must pass it to become slow or settled, and this is impossible because is not settled.
Using (b) we learn from the above that in , at time there are at least settled or slow agents that entered before . Of these, at least agents are slow and mobile, and have greater depth than or are in a different branch of (because they arrived before and could not have passed them). There are thus at most vertices could have visited since entering , meaning its depth is at most , and we have .
If this inequality is strict, then from statement (a) we learn that is settled or slow, so (b) is true and we are done. Otherwise, . We saw there are (at least) slow mobile agents in that have greater depth than or are in a different branch of . From this, we infer that any descendant of must contain a slow mobile agent, or that is at a leaf of and has no descendants. Thus, if is not already settled or slow, it will become slow after the activation at time , since its slow descendants will prevent it from moving. This completes the proof.
∎
VI-D Proof of Lemma III.13
Proof.
Let be the copy of simulated over . Let be the meaningful event times of . We show by induction that at any time , for all non-deleted agents:
[TABLE]
This implies any agent that enters must have already or concurrently entered , completing the proof.
The induction statement is trivially true at time , as no event has occurred yet. We assume it is true up to time , and show it remains true at .
If the event scheduled for time was a deletion of some agent, the statement remains trivially true (as both simulated versions of the agent are deleted). Otherwise, the scheduled event is the uninterrupted activation of some pair of agents and .
Any agent where does not move, so we need only verify the inductive statement remains true for and . The only situation in which Inequality 8 is falsified at time if it is true at time is if and is mobile at time , but manages to move whereas is blocked by a mobile agent . By the inductive hypothesis, . Because is a path graph and , we know that at all times after entered the environment. Hence, if and blocks , then must also block when it attempts to move. This shows that the inductive hypothesis is correct at time .
∎
VI-E Proof of Lemma III.14
Proof.
Unlike Lemma III.13, here we count the number of agents that enter , and not the number of currently existing agents that entered it. This means we count also agents that entered at but were deleted. This difference is necessary for the comparison, because agents cannot be deleted from .
Despite this difference, the proof is very similar to Lemma III.13. One shows by induction on the meaningful event times that at any time , for any such that was not deleted we have:
[TABLE]
Note that is the index of the vertex of at time . If crossed we must have . Recalling that if is outside of at time then , we see by (9) that crossing can only happen if entered , or if was deleted before entering. Hence, the Lemma follows from (9).
Let us show (9) holds by induction. It holds trivially for all at . Now, assume (9) holds at time , and we will show it holds at time .
Suppose the pair of agents activated at is and . Then these are the only agents for which (9) might be false at . Assuming (9) is true at , it can only become false at if , but successfully moves as a result of activation at time whereas does not and also is not deleted. If does not move this means some , is blocking it. Hence, we must have . By the inductive hypothesis we have . Since , is always ahead of , meaning . Combining these (in)equalities we get , hence . This completes the proof by induction of (9).
∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Abbas et al. [2006] Sheila Abbas, Mohamed Mosbah, and Akka Zemmari. Distributed computation of a spanning tree in a dynamic graph by mobile agents. In 2006 IEEE International Conference on Engineering of Intelligent Systems , pages 1–6. IEEE, 2006.
- 2Agmon [2017] Noa Agmon. Robotic strategic behavior in adversarial environments. In Proceedings of the 26th International Joint Conference on Artificial Intelligence , pages 5106–5110. AAAI Press, 2017.
- 3Agmon and Peleg [2006] Noa Agmon and David Peleg. Fault-tolerant gathering algorithms for autonomous mobile robots. SIAM Journal on Computing , 36(1):56–82, 2006.
- 4Agmon et al. [2006] Noa Agmon, Noam Hazon, and Gal A Kaminka. Constructing spanning trees for efficient multi-robot coverage. In Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006. , pages 1698–1703. IEEE, 2006.
- 5Altshuler et al. [2018] Yaniv Altshuler, Alex Pentland, and Alfred M Bruckstein. Introduction to swarm search. In Swarms and Network Intelligence in Search , pages 1–14. Springer, 2018.
- 6Amigoni and Caglioti [2010] Francesco Amigoni and Vincenzo Caglioti. An information-based exploration strategy for environment mapping with mobile robots. Robotics and Autonomous Systems , 58(5):684–699, 2010.
- 7Amir and Bruckstein [2019 a] Michael Amir and Alfred M. Bruckstein. Minimizing travel in the uniform dispersal problem for robotic sensors. In Proceedings of the 18th International Conference on Autonomous Agents and Multi Agent Systems . International Foundation for Autonomous Agents and Multiagent Systems, 2019 a.
- 8Amir and Bruckstein [2019 b] Michael Amir and Alfred M Bruckstein. Probabilistic pursuits on graphs. Theoretical Computer Science , 2019 b.
