A sub-modular receding horizon solution for mobile multi-agent   persistent monitoring

Navid Rezazadeh; Solmaz S. Kia

arXiv:1908.04425·cs.MA·October 22, 2020

A sub-modular receding horizon solution for mobile multi-agent persistent monitoring

Navid Rezazadeh, Solmaz S. Kia

PDF

TL;DR

This paper presents a submodular receding horizon approach for optimizing persistent monitoring by heterogeneous mobile agents, balancing computational efficiency and robustness in complex, NP-hard scenarios.

Contribution

It introduces a novel suboptimal dispatch policy leveraging submodularity and receding horizon techniques for multi-agent persistent monitoring.

Findings

01

The proposed method achieves near-optimal rewards in simulations.

02

Receding horizon approach reduces computational complexity.

03

Incorporating nodal importance improves monitoring effectiveness.

Abstract

We study the problem of persistent monitoring of a finite number of inter-connected geographical nodes by a group of heterogeneous mobile agents. We assign to each geographical node a concave and increasing reward function that resets to zero after an agent's visit. Then, we design the optimal dispatch policy of which nodes to visit at what time and by what agent by finding a policy set that maximizes a utility that is defined as the total reward collected at visit times. We show that this optimization problem is NP-hard and its computational complexity increases exponentially with the number of the agents and the length of the mission horizon. By showing that the utility function is a monotone increasing and submodular set function of agents' policy, we proceed to propose a suboptimal dispatch policy design with a known optimality gap. To reduce the time complexity of constructing the…

Equations87

R_{v} (t) = {0, ψ_{v} (t - \overset{ˉ}{t}_{v}), t = \overset{ˉ}{t}_{v}, t > \overset{ˉ}{t}_{v},

R_{v} (t) = {0, ψ_{v} (t - \overset{ˉ}{t}_{v}), t = \overset{ˉ}{t}_{v}, t > \overset{ˉ}{t}_{v},

R (\overset{ˉ}{P}) = \sum_{\forall p \in \overset{ˉ}{P}} \sum_{l = 1}^{n_{p}} R_{V_{p} (l)} (T_{p} (l)) .

R (\overset{ˉ}{P}) = \sum_{\forall p \in \overset{ˉ}{P}} \sum_{l = 1}^{n_{p}} R_{V_{p} (l)} (T_{p} (l)) .

P^{⋆}

P^{⋆}

∣ \overset{ˉ}{P} \cap P^{i} ∣ \leq 1 i \in A,

Δ_{g} (q ∣ \overset{ˉ}{Q}) = g (\overset{ˉ}{Q} \cup q) - g (\overset{ˉ}{Q}),

Δ_{g} (q ∣ \overset{ˉ}{Q}) = g (\overset{ˉ}{Q} \cup q) - g (\overset{ˉ}{Q}),

Δ_{g} (q ∣ \overset{ˉ}{Q}_{1}) \geq Δ_{g} (q ∣ \overset{ˉ}{Q}_{2}) .

Δ_{g} (q ∣ \overset{ˉ}{Q}_{1}) \geq Δ_{g} (q ∣ \overset{ˉ}{Q}_{2}) .

g (Q_{1}) \leq g (Q_{2}) .

g (Q_{1}) \leq g (Q_{2}) .

P^{⋆}

P^{⋆}

\mathsf{L}(v,w,\hat{t},i)={L(v,\hat{t}+\tau^{i}_{w,v},r)}\big{/}{\tau^{i}_{w,v}}.

\mathsf{L}(v,w,\hat{t},i)={L(v,\hat{t}+\tau^{i}_{w,v},r)}\big{/}{\tau^{i}_{w,v}}.

\overset{ˉ}{R} (\overset{ˉ}{P}) = R (\overset{ˉ}{P}) + α \sum_{\forall p \in \overset{ˉ}{P}} \forall v \in \overset{ˉ}{V} max L (v, p), α \in_{\geq 0} .

\overset{ˉ}{R} (\overset{ˉ}{P}) = R (\overset{ˉ}{P}) + α \sum_{\forall p \in \overset{ˉ}{P}} \forall v \in \overset{ˉ}{V} max L (v, p), α \in_{\geq 0} .

(t^{v} (Q))_{1}^{c (v, Q)} = (t_{1}^{v} (Q), t_{2}^{v} (Q), \dots, t_{c (v, Q)}^{v} (Q))

(t^{v} (Q))_{1}^{c (v, Q)} = (t_{1}^{v} (Q), t_{2}^{v} (Q), \dots, t_{c (v, Q)}^{v} (Q))

{\mathsf{R}}(\bar{\mathcal{P}})=\sum\nolimits_{v\in{\mathcal{I}}_{\bar{\mathcal{P}}}}{}\big{(}\sum\nolimits_{j=1}^{c(v,\bar{\mathcal{P}})}\psi_{v}(\Delta\mathfrak{t}_{j}^{v}(\bar{\mathcal{P}}))\big{)}

{\mathsf{R}}(\bar{\mathcal{P}})=\sum\nolimits_{v\in{\mathcal{I}}_{\bar{\mathcal{P}}}}{}\big{(}\sum\nolimits_{j=1}^{c(v,\bar{\mathcal{P}})}\psi_{v}(\Delta\mathfrak{t}_{j}^{v}(\bar{\mathcal{P}}))\big{)}

\sum_{j = 1}^{c (v, Q_{1} \cup q)} ψ_{v} (Δ (t_{j}^{v} (Q_{2} \cup q))) - \sum_{j = 1}^{c (v, Q_{1})} ψ_{l} (Δ (t_{j}^{v} (Q_{2}))) \geq 0

\sum_{j = 1}^{c (v, Q_{1} \cup q)} ψ_{v} (Δ (t_{j}^{v} (Q_{2} \cup q))) - \sum_{j = 1}^{c (v, Q_{1})} ψ_{l} (Δ (t_{j}^{v} (Q_{2}))) \geq 0

j = 1 \sum c (v, Q_{2} \cup q) ψ_{v} (Δ (t_{j}^{v} (Q_{2} \cup q)))

j = 1 \sum c (v, Q_{2} \cup q) ψ_{v} (Δ (t_{j}^{v} (Q_{2} \cup q)))

\leq

j = 1 \sum c (v, Q_{1} \cup q) ψ_{v} (Δ (t_{j}^{v} (Q_{1} \cup q)))

Δ_{R} (q ∣ Q_{1}) \geq Δ_{R} (q ∣ Q_{2})

Δ_{R} (q ∣ Q_{1}) \geq Δ_{R} (q ∣ Q_{2})

δ t_{1} \geq δ t_{2} \geq \dots \geq δ t_{n},

δ t_{1} \geq δ t_{2} \geq \dots \geq δ t_{n},

δ v_{1} \geq δ v_{2} \geq \dots \geq δ v_{n},

δ v_{1} \geq δ v_{2} \geq \dots \geq δ v_{n},

δ t_{1} + \dots + δ t_{i} \geq δ v_{1} + \dots + δ v_{i}, \forall i \in {1, \dots, n - 1}

δ t_{1} + \dots + δ t_{i} \geq δ v_{1} + \dots + δ v_{i}, \forall i \in {1, \dots, n - 1}

δ t_{1} + \dots + δ t_{n} = δ v_{1} + \dots + δ v_{n}

δ t_{1} + \dots + δ t_{n} = δ v_{1} + \dots + δ v_{n}

f (δ t_{1}) + \dots + f (δ t_{n}) \leq f (δ v_{1}) + \dots + f (δ v_{m})

f (δ t_{1}) + \dots + f (δ t_{n}) \leq f (δ v_{1}) + \dots + f (δ v_{m})

f (c) + f (d) - f (c + d) \leq f (a) + f (b) - f (a + b)

f (c) + f (d) - f (c + d) \leq f (a) + f (b) - f (a + b)

(A 1) : δ t_{1} = c + d, δ t_{2} = a, δ t_{3} = b,

(A 1) : δ t_{1} = c + d, δ t_{2} = a, δ t_{3} = b,

(A 2) : δ t_{1} = c + d, δ t_{2} = b, δ t_{3} = a,

(B 1) : δ v_{1} = a + b, δ v_{2} = d, δ v_{3} = c,

(B 1) : δ v_{1} = a + b, δ v_{2} = d, δ v_{3} = c,

(B 2) : δ v_{1} = a + b, δ v_{2} = c, δ v_{3} = d,

(B 3) : δ v_{1} = d, δ v_{2} = a + b, δ v_{3} = c,

(B 4) : δ v_{1} = c, δ v_{2} = a + b, δ v_{3} = d,

(B 5) : δ v_{1} = c, δ v_{2} = d, δ v_{3} = a + b,

(B 6) : δ v_{1} = d, δ v_{2} = c, δ v_{3} = a + b .

f (c) + f (d) + f (a + b) \leq f (a) + f (b) + f (c + d)

f (c) + f (d) + f (a + b) \leq f (a) + f (b) + f (c + d)

f (a) + f (b) - f (a + b) \leq f (c) + f (d) - f (c + d) .

f (a) + f (b) - f (a + b) \leq f (c) + f (d) - f (c + d) .

g ((q)_{1}^{l}) = \sum_{i = 1}^{l - 1} f (Δ q_{i}),

g ((q)_{1}^{l}) = \sum_{i = 1}^{l - 1} f (Δ q_{i}),

g ((a)_{1}^{n + l}) - g ((t)_{1}^{n}) \geq 0.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A sub-modular receding horizon solution for

mobile multi-agent persistent monitoring

Navid Rezazadeh and Solmaz S. Kia This work is supported by NSF award IIS-SAS-1724331. A preliminary version of this work will appear in the proceeding of the 8th IFAC Workshop on Distributed Estimation and Control in Networked Systems [29].

Abstract

We consider persistent monitoring of a finite number of inter-connected geographical nodes by a group of heterogeneous mobile agents. We assign to each geographical node a concave and increasing reward function that resets to zero after an agent’s visit. Then, we design the optimal dispatch policy of which nodes to visit at what time and by what agent by finding a policy set that maximizes a utility that is defined as the total reward collected at visit times. We show that this optimization problem is NP-hard and its computational complexity increases exponentially with the number of the agents and the length of the mission horizon. By showing that the utility function is a monotone increasing and submodular set function of agents’ policy, we propose a suboptimal dispatch policy design with a known optimality gap. To reduce the time complexity of constructing the feasible search set and also to induce robustness to changes in the operational factors, we perform our suboptimal policy design in a receding horizon fashion. Then, to compensate for the shortsightedness of the receding horizon approach we add a new term to our utility, which provides a measure of nodal importance beyond the receding horizon. This term gives the policy design an intuition to steer the agents towards the nodes with higher rewards on the patrolling graph. Finally, we discuss how our proposed algorithm can be implemented in a decentralized manner. A simulation study demonstrates our results.

I Introduction

In recent years, coordinating the movement of mobile sensors to cover areas that have not been adequately sampled/observed has been explored in controls, wireless sensors and robotic communities with problems related to coverage, exploration, and deployment. Many of the proposed algorithms strive to spread sensors to desired positions to obtain a stationary configuration such that the coverage is optimized, see e.g., [1, 2, 3, 4, 5, 6, 7, 8]. Some sensor placement problems such [4, 6, 7, 8] are context-aware, and include also a period of exploration and observation to increase the knowledge used to find the optimal residing position of the sensors. In this paper, instead of aiming to achieve an improved stationary network configuration as the end result of the sensors’ movement, our objective is to explore context-aware mobility strategies that dynamically reposition the mobile sensors to maximize their utilization and contribution over a mission horizon. Motivating applications include persistent monitoring to discover forest fires [9] or oil spillage in its early stages [10], locating endangered animals in a large habitat [11] and event detection in urban environments [12]. Specifically, we consider a persistent monitoring of a set of finite $\mathcal{V}$ inter-connected geographical nodes via a set of finite $\mathcal{A}$ mobile sensors/agents, where $|\mathcal{V}|>|\mathcal{A}|$ . The mobile agents are confined to a set of pre-specified edges $\mathcal{E}\!\subset\!\mathcal{V}\times\mathcal{V}$ , e.g., aerial or ground corridors, to traverse from one node to another, see Fig. 1. Depending on their vehicle type, agents may have to take different edges to go from one node to another. Also, they may have different travel times along the same edge. We study dispatch policy that orchestrates the topological distribution of the mobile agents such that an optimized service for a global monitoring task is provided with a reasonable computational cost. To quantify the service objective we assign to each node $v\in\mathcal{V}$ the reward function,

[TABLE]

where $\psi_{v}(t)$ is a nonnegative concave and increasing function of time and $\bar{t}_{v}$ is the latest time node $v$ is visited by an agent. For example, in data harvesting or health monitoring, $\psi_{v}(.)$ can be the weighted idle time of the node $v$ or in event detection, it can be the probability of at least one event taking place at inter-visit times. Optimal patrolling designs a dispatch policy (what sequence of nodes to visit at what times by which agents) to score the maximum collective reward for the team over the mission horizon. However, as we explain below, this problem is NP-hard. Our aim then is to design a suboptimal solution that has polynomial time complexity.

Related work: Dispatch policy design for patrolling/monitoring of geographical nodes can be divided into two categories: the edges to travel between the nodes are not specified (design in continuous edge space) or otherwise (design in discrete edge space). When there are no prespecified inter-node edges, the optimal patrolling policy design includes also finding the optimal inter-node trajectories that the agents should follow without violating their mobility limits. In some applications, however, the mobile agents are confined to travel through pre-specified known edges between the nodes. For example, in a smart city setting, regulations can restrict the admissible routes between the geographical nodes. In the dispatch policy design in discrete edge space, the complexity of finding the optimal policy for a single patrolling agent is the same as the complexity of solving the Traveling Salesman problem, where the computational complexity grows exponentially with the number of the nodes [13]. In case of multiple patrolling agents, the problem is even more complex, since each agent’s policy design depends on the other agents’ policy. This problem is formalized in earlier studies such as [14, 15]. Generally, when there are multiple edges to travel between every two nodes or when each node is connected to multiple other nodes, finding an optimal long term patrolling scheme is not tractable. Constraining the agents to travel through specific edges to traverse among the geographical nodes allows seeking optimal solutions for the problem. For example, when the connection topology between the geographical nodes is a path or a cyclic graph, optimal solutions for the problem are proposed in [16, 17, 18, 19]. To overcome the complexity issue on generic graphs, [20] explores forming different cycles in the graph and assigning agents to these cycles to patrol the nodes periodically and seeks to minimize the time that a node stays un-visited. Alternatively, [21] proposes agents to move to the most rewarding neighboring node based on their current location.

Statement of contribution:

In this paper, we propose a robust and suboptimal solution to the long term patrolling problem that we stated earlier. Instead of using the customary idle time, $\psi_{v}(t)=t$ , as a reward function, which reduces the optimal dispatch policy design to the minimum latency problem [22], we consider reward functions described by an increasing concave function. This allows modeling a wider class of patrolling problems such as patrolling for event detection. We let the utility function to be the sum of the rewards collected over the mission horizon by the mobile agents. We discuss that the design of optimal patrolling policy to maximize this utility over the mission horizon is an NP-hard problem. Specifically, we show that the complexity of finding the optimal policy increases exponentially with the mission horizon and number of agents. Next, we show that the utility function is a monotone increasing and submodular set function. To establish this result, we develop a set of auxiliary lemmas, presented in the appendix, based on the Karamata’s inequality [23]. Given the submodularity of the utility function, we propose a receding horizon sequential greedy algorithm to compute a suboptimal dispatch policy with a polynomial computation cost and guaranteed bound on optimality. The receding horizon nature of our solution induces robustness to uncertainties of the environment. Our next contribution is to add a new term to our utility function to compensate for the shortsightedness of the receding horizon approach, see Fig. 2. When agents patrol a large set of inter-connected nodes, this added term becomes useful by giving them an intuition of the existing reward in the farther nodes. In recent years, submodular optimization has been widely used in sensor and actuator placement problems [3, 24, 25, 26, 2, 27]. In comparison to the sensor/actuator placement problems, the challenge in our work is that the assigned policy per each mobile agent over the receding horizon is a dynamic scheduling problem rather than a static sensor placement. To deal with this challenge, we use the matroid constraint [28] approach to design our suboptimal submodular-based policy. Finally, we discuss how our algorithm can be implemented in a decentralized manner. A simulation study demonstrates our results. Our notation is standard, though to avoid confusion, certain concepts and notation are defined as the need arises. This paper extends our preliminary work [29] in detailed technical treatment including all the proofs, introducing the notion of local importance to compensate for the shortsightedness of receding horizon approach, decentralized implementation of our algorithm, and a new simulation study. Also, we consider a more generalized case of reward functions.

II Problem Formulation

To formalize our objective, we first introduce our notations and state our standing assumptions. For any node $v\in\mathcal{V}$ , $\mathcal{N}_{v}$ is a set consisting node $v$ and all the neighboring nodes that are connected to node $v$ via an edge in $\mathcal{E}$ . If there exists a path connecting node $v\in\mathcal{V}$ to node $w\in\mathcal{V}$ , we let $\tau^{i}_{v,w}\in_{>0}$ be the shortest travel time of agent $i\in\mathcal{A}$ from node $v$ to $w$ .

Assumption 1

Upon arrival of any agent $i\in\mathcal{A}$ at any time $\bar{t}\in_{>0}$ at node $v\in\mathcal{V}$ , the agent immediately scans the node and the reward $R_{v}(\bar{t})$ is scored for the patrolling team $\mathcal{A}$ and $\bar{t}_{v}$ of node $v$ in (1) is set to $\bar{t}$ . If more than one agent arrives at node $v\in\mathcal{V}$ and scans it at the same time $\bar{t}$ , the reward collected for the team is still $R_{v}(\bar{t})$ . If an agent $i\in\mathcal{A}$ needs to linger over each node for $\delta^{i}\in_{\geq 0}$ amount of time to complete its scan, during this time the agent cannot scan the node again to score a reward for the team.

Let the tuple ${\mathsf{p}}=(\boldsymbol{\mathbf{\mathsf{V}}}_{\mathsf{p}},\boldsymbol{\mathbf{\mathsf{T}}}_{\mathsf{p}},\mathsf{a}_{\mathsf{p}})$ be a dispatch policy of agent $\mathsf{a}_{\mathsf{p}}\in\mathcal{A}$ over the given mission time horizon, where $\boldsymbol{\mathbf{\mathsf{V}}}_{p}$ and $\boldsymbol{\mathbf{\mathsf{T}}}_{\mathsf{p}}$ are the vectors that specify the nodes and the corresponding visit times assigned to agent $\mathsf{a}_{\mathsf{p}}$ . Moreover, we let $\mathsf{n}_{\mathsf{p}}$ be the total number of nodes visited by agent $\mathsf{a}_{\mathsf{p}}$ , i.e., $\mathsf{n}_{\mathsf{p}}=\text{dim}(\boldsymbol{\mathbf{\mathsf{V}}}_{\mathsf{p}})$ . We refer to $\mathsf{n}_{\mathsf{p}}$ as the length of the policy $\mathsf{p}$ . We refer to $(\boldsymbol{\mathbf{\mathsf{V}}}_{\mathsf{p}}(l),\boldsymbol{\mathbf{\mathsf{T}}}_{\mathsf{p}}(l))$ , $l\in\{1,2,\cdots,\mathsf{n}_{\mathsf{p}}\}$ , as the $l^{\text{th}}$ step of policy ${\mathsf{p}}$ . Furthermore, for any agent $i\in\mathcal{A}$ , we let $\mathcal{P}^{i}$ be the set of all the admissible policies $\mathsf{p}$ over the mission horizon such that $\mathsf{a}_{\mathsf{p}}=i$ .

Assumption 2

For any policy $\mathsf{p}$ , we have $\boldsymbol{\mathbf{\mathsf{V}}}_{\mathsf{p}}(l+1)\in\mathcal{N}_{\boldsymbol{\mathbf{\mathsf{V}}}_{\mathsf{p}}(l)}$ , for all $l\in\{1,2,\cdots,\mathsf{n}_{\mathsf{p}}-1\}$ .

We let $\mathcal{P}=\bigcup_{i\in\mathcal{A}}\mathcal{P}^{i}$ . Then, given any $\bar{\mathcal{P}}\subset\mathcal{P}$ , the utility function $\mathsf{R}:2^{\mathcal{P}}\to{\mathbb{R}}_{>0}$ is $\bar{\mathcal{P}}\subset\mathcal{P}$ , the utility function $\mathsf{R}:2^{\mathcal{P}}\to{\mathbb{R}}_{>0}$ is

[TABLE]

Given (2), the optimal policy to maximize the utility over a given mission horizon is given by

[TABLE]

where $|\,.\,|$ returns the cardinality of a set. The constraint condition (3b) is in the so-called partition matroid form [28] and restricts the choice of the optimal solution to be a set that contains of at most one member from each disjoint sets $\mathcal{P}^{i},\,\,i\in\mathcal{A}$ . A set value optimization problem of the form (3) is known to be NP-hard [30]. Lemma II.1 below, whose proof is given in the appendix, gives the cost of constructing the feasible set $\mathcal{P}$ and time complexity of solving optimization problem (3).

Lemma II.1 (Time complexity of problem (3a))

The cost of constructing the feasible set $\mathcal{P}$ of optimization problem (3a) is of order $O(\sum_{i\in\mathcal{A}}\mathsf{D}^{\bar{\mathsf{n}}^{i}})$ , where $\mathsf{D}=\max_{v\in\mathcal{V}}(|\mathcal{N}_{v}|)$ and $\bar{\mathsf{n}}^{i}=\max\{\mathsf{n}_{\mathsf{p}}\}_{\forall\mathsf{p}\in\mathcal{P}^{i}}$ . Furthermore, the time complexity of solving optimization problem (3a) is $O(\prod_{i\in\mathcal{A}}\mathsf{D}^{\bar{\mathsf{n}}^{i}})$ .

If the system parameters, such as number of the mobile agents or the nodes, or the parameters of $\psi_{v}(.)$ of the reward function at any node $v$ , change after the optimal policy design, the optimization problem (3) should be solved again over the remainder of the mission horizon under the new conditions. Our objective in this paper is to construct a suboptimal solution to solve the persistent monitoring problem given by (3) with polynomial time complexity. Moreover, we seek a solution that has intrinsic robustness to changes that can happen during the mission horizon.

We close this section by introducing some definitions and notations used subsequently. For any set function $g:\,2^{\mathcal{Q}}\to\mathbb{R}$ , we let

[TABLE]

for $\forall\bar{\mathcal{Q}}\in 2^{\mathcal{Q}}$ and $\forall\mathsf{q}\in\mathcal{Q}$ , where $\Delta_{g}$ shows the increase in value of the set function $g$ going from set $\bar{\mathcal{Q}}$ to $\bar{\mathcal{Q}}\cup\mathsf{q}$ . Recall that $g:\,2^{\mathcal{Q}}\to\mathbb{R}$ is submodular if and only if for two sets $\mathcal{Q}_{1}$ and $\mathcal{Q}_{2}$ satisfying $\mathcal{Q}_{1}\subset\mathcal{Q}_{2}\subset\mathcal{Q}$ , and for $\mathsf{q}\not\in\mathcal{Q}_{2}$ we have [28]

[TABLE]

Then submodularity is a property of set functions that shows diminishing reward as new members are being introduced to the system. We say $g:2^{\mathcal{Q}}\to$ is monotone increasing if for all $\mathcal{Q}_{1},\mathcal{Q}_{2}\subset\mathcal{Q}$ we have $\mathcal{Q}_{1}\subset\mathcal{Q}_{2}$ if and only if [28]

[TABLE]

We denote a sequence of $m$ real numbers $(\mathfrak{t}_{1},\cdots,\mathfrak{t}_{m})$ by $(\mathfrak{t})_{1}^{m}$ . Given two increasing (resp. decreasing) sequences $(\mathfrak{t})_{1}^{n}$ and $(\mathfrak{v})_{1}^{m}$ , $(\mathfrak{t})_{1}^{n}\oplus(\mathfrak{v})_{1}^{m}$ is their concatenated increasing (resp. decreasing) sequence, i.e., for $(\mathfrak{u})_{1}^{n+m}=(\mathfrak{t})_{1}^{n}\oplus(\mathfrak{v})_{1}^{m}$ , any $\mathfrak{u}_{k}$ , $k\in\{1,\cdots,n+m\}$ is either in $(\mathfrak{t})_{1}^{n}$ or $(\mathfrak{v})_{1}^{m}$ or is in both. We assume that $(\mathfrak{u})_{1}^{n+m}$ preserves the relative labeling of $(\mathfrak{t})_{1}^{n}$ or $(\mathfrak{v})_{1}^{m}$ , i.e., if $\mathfrak{t}_{k}$ and $\mathfrak{t}_{k+1}$ , $k\in\{1,\cdots,n-1\}$ (resp. $\mathfrak{v}_{k}$ and $\mathfrak{v}_{k+1}$ , $k\in\{1,\cdots,m-1\}$ ) correspond to $\mathfrak{u}_{i}$ and $\mathfrak{u}_{j}$ in $(\mathfrak{u})_{1}^{n+m}$ , then $i<j$ .

III Suboptimal policy design

According to Lemma II.1 the time complexity of finding an optimal patrolling policy in (3a) increases exponentially by the maximum length, $\bar{\mathsf{n}}^{i}$ , of the admissible policies of any agent $i\in\mathcal{A}$ and also by the number of the exploring agents $M$ . In light of this observation, to reduce the computational cost, we propose the following suboptimal policy design. Since the maximum policy length $\bar{\mathsf{n}}^{i}$ is proportional to the length of the mission horizon, we first propose to trade in optimality and divide the planning horizon into multiple shorter horizons so that the policy design can be carried out in a consecutive manner over these shorter horizons. Then, to reduce the optimality gap and also to induce robustness to the online changes that can occur during the mission time, we propose to implement this approach in a receding horizon fashion where we calculate the policy over a specified shorter horizon but execute only some of the initial steps of the policy, and then we repeat the process. However, a receding horizon approach suffers from what we refer to as shortsightedness. That is, over large inter-connected geographical node sets, a receding horizon design is oblivious to the reward distribution of the nodes that are not in the feasible policy set in the planning horizon. Then, the optimal policy over the planning horizon can inadvertently steer the agents away from the distant nodes with a higher reward, see Fig. 2. To compensate for this shortcoming, we introduce the notion of nodal importance and augment the reward function (2) over the design horizon with an additional term that given an admissible policy, provides a measure of how close an agent at the final step of the policy is to a cluster of geographical nodes with a high concentration of reward.

Let the augmented reward, whose exact form will be introduced below, over the planning horizon be $\bar{\mathsf{R}}$ . Then, the optimal policy design over each receding horizon is

[TABLE]

where hereafter $\mathcal{P}=\bigcup_{i\in\mathcal{A}}\mathcal{P}^{i}$ is the set of the union of the admissible policies of the agents $\mathcal{P}^{i}$ , $i\in\mathcal{A}$ , over the planning horizon. Hereafter, we let $\bar{\mathfrak{t}}^{v}_{0}$ be the last time node $v\in\mathcal{V}$ was visited before a planning horizon starts.

Next, to reduce the computational burden further, we propose to use Algorithm 1, which is a sequential greedy algorithm with a polynomial cost in terms of the number of the agents to obtain a suboptimal solution for (4). In what follows, we show that since the objective function (4) is a submodular set function, Algorithm 1 comes with a known optimality gap. We also show that with a proper inter-agent communication coordination Algorithm 1 can be implemented in a decentralized manner.

For $v\in\mathcal{V}$ , let $\mathcal{N}_{v}^{r}$ be the set consisted of node $v$ itself and its $r$ -hope neighbors. This set can be computed using the Breadth-first search in time ${O}(|\mathcal{E}|+|\mathcal{V}|)$ [31]. Here, $\tau^{i}_{w,v}$ can be computed via $A^{\star}$ algorithm in time $O(|\mathcal{E}|)$ [32]. Then, for every node $v\in\mathcal{V}$ , we define the nodal importance with radius $r$ at time $\tau$ as $L(v,\tau,r)=\sum\nolimits_{w\in\mathcal{N}_{v}^{r}}{}R_{w}(\tau)$ . Next, given an agent $i\in\mathcal{A}$ that is at node $w\in\mathcal{V}$ at time $\hat{t}\in_{\geq 0}$ , we define the relative nodal importance of a node $v\in\mathcal{V}$ with respect to agent $i$ as

[TABLE]

Then, $\mathsf{L}(v,\boldsymbol{\mathbf{\mathsf{V}}}_{\mathsf{p}}(\mathsf{n}_{\mathsf{p}}),\boldsymbol{\mathbf{\mathsf{T}}}_{\mathsf{p}}(\mathsf{n}_{\mathsf{p}}),\mathsf{a}_{\mathsf{p}})$ is a measure of the relative size of the awards concentration around any node $v\in\mathcal{V}$ that takes into account also the travel time of agent $\mathsf{a}_{\mathsf{p}}$ from the final step of policy ${\mathsf{p}}=(\boldsymbol{\mathbf{\mathsf{V}}}_{\mathsf{p}},\boldsymbol{\mathbf{\mathsf{T}}}_{\mathsf{p}},\mathsf{a}_{\mathsf{p}})\in\mathcal{P}$ to $v$ . Let $\mathsf{L}(v,\mathsf{p})$ be the shorthand notation for $\mathsf{L}(v,\boldsymbol{\mathbf{\mathsf{V}}}_{\mathsf{p}}(\mathsf{n}_{\mathsf{p}}),\boldsymbol{\mathbf{\mathsf{T}}}_{\mathsf{p}}(\mathsf{n}_{\mathsf{p}}),\mathsf{a}_{\mathsf{p}})$ . To compensate for the shortsightedness of the receding horizon design, then we revise the utility function to

[TABLE]

The weighting factor $\alpha\in_{\geq 0}$ defines how much significance we want to assign to the distribution of the reward beyond the receding horizon. We should note that using a large $\alpha$ can gravitate the agents to move towards the nodes close to the anchor nodes, and make them oblivious to the rest of the nodes. For computational efficiency, instead of incorporating the relative nodal importance of all the nodes, which can be achieved by setting $\bar{\mathcal{V}}$ equal to $\mathcal{V}$ , we propose to use only $\bar{\mathcal{V}}$ subset of the nodes. We refer to nodes in $\bar{\mathcal{V}}$ as anchor nodes. The anchor nodes can be selected to be the nodes with higher reward return or to be a set of nodes that are scattered uniformly on the graph. It is interesting to note that the relative nodal importance term in (5) is a reminiscent of terminal cost used in the model predictive control (MPC). In MPC, terminal cost that is used to achieve an infinite horizon control with closed-loop stability guarantees [33] in some way also compensates for the shortsightedness of the design over finite planning horizon. Next, we show that the reward function (5) is submodular over any given feasible policy set $\mathcal{P}$ in every planning horizon.

Theorem III.1 (Submodularity of the reward function (5))

For any weighting factor $\alpha\in_{\geq 0}$ , the reward function $\bar{\mathsf{R}}:2^{\mathcal{P}}\to{\mathbb{R}}_{>0}$ in (5) is a monotone increasing and submodular set function over $\mathcal{P}$ .

Proof:

Let $\mathtt{c}(v,{\mathcal{Q}}):\mathcal{V}\times 2^{\mathcal{Q}}\to\mathbb{Z}_{>0}$ be the total number of visits to the geographical node $v$ , and ${\mathcal{I}}_{{\mathcal{Q}}}\subset\mathcal{V}$ be the set of the nodes that are visited when a policy set ${\mathcal{Q}}\subset\mathcal{P}$ is implemented. Furthermore, let the increasing sequence

[TABLE]

be the sequence of time that node $v\in{\mathcal{I}}_{{\mathcal{Q}}}$ was visited when agents implement ${\mathcal{Q}}$ . Now consider the reward function $\bar{\mathsf{R}}$ in (5). Then, the first summand of $\bar{\mathsf{R}}$ expands as

[TABLE]

, where $\Delta\mathfrak{t}_{j}^{v}(\bar{\mathcal{P}})=\mathfrak{t}^{v}_{j}(\bar{\mathcal{P}})-\mathfrak{t}^{v}_{j-1}(\bar{\mathcal{P}})$ is the time between two consecutive visits of node $v$ , and $\mathfrak{t}^{v}_{0}(\bar{\mathcal{P}})=\bar{\mathfrak{t}}^{v}_{0}$ . Next, consider the monitoring policy sets $\mathcal{Q}_{1},\,\mathcal{Q}_{2}$ and monitoring policy $\mathsf{q}$ with $\mathcal{Q}_{1}\subset\mathcal{Q}_{2}\subset\mathcal{P}$ , $\mathsf{q}\in\mathcal{P}$ , $\mathsf{q}\not\in\mathcal{Q}_{1},$ and $\mathsf{q}\not\in\mathcal{Q}_{2}$ . Because $(\mathfrak{t}^{v}(\mathcal{Q}_{1}))_{1}^{c(v,\mathcal{Q}_{1})}$ is a sub-sequence of $(\mathfrak{t}^{v}(\mathcal{Q}_{2}))_{1}^{c(v,\mathcal{Q}_{2})}$ , using Lemma A.2 and the fact that $\psi(.)_{v}$ is a normalized increasing concave function, we conclude that

[TABLE]

for $\forall v\in{\mathcal{I}}_{\bar{\mathcal{P}}}$ . Therefore, $\Delta_{\mathsf{R}}(p|\mathcal{Q}_{1})\geq 0$ which shows that $\mathsf{R}(\bar{\mathcal{P}})$ is a monotone increasing set function. Furthermore, using Lemma A.3 we can write

[TABLE]

Hence,

[TABLE]

which shows that ${\mathsf{R}}(\bar{\mathcal{P}})$ is a submodular set function. Then, since the second summand of $\bar{\mathsf{R}}$ , $\sum\nolimits_{\forall\mathsf{p}\in\bar{\mathcal{P}}}{}\underset{\forall l\in\bar{\mathcal{V}}}{\text{max }}\mathsf{L}(l,\mathsf{p})$ , is trivially positive and modular, the proof is concluded. ∎

Due to Theorem III.1, the suboptimal dispatch policy of Algorithm 1, which has a polynomial computational complexity, has the following well-defined optimality gap.

Theorem III.2 (Optimality gap of Algorithm 1)

Let $\mathcal{P}^{\star}$ be an optimal solution of (4) and $\bar{\mathcal{P}}$ be the output of Algorithm 1. Then, $\bar{\mathsf{R}}(\bar{\mathcal{P}})\geq\frac{1}{2}\bar{\mathsf{R}}(\mathcal{P}^{\star})$ .

Proof:

Since the objective function of (4) is monotone increasing and submodular over $\mathcal{P}$ , the proof follows by invoking [28, Theorem 5.1]. ∎

III-A Comments on decentralized implementations of Algorithm 1

To implement Algorithm 1, given the current position of each agent and $\{\bar{\mathfrak{t}}^{0}_{v}\}_{v\in\mathcal{V}}$ at the beginning of each planning horizon, the admissible set of policies $\mathcal{P}^{i}$ for each agent $i\in\mathcal{A}$ should be calculated.

Let every agent know $\{\psi_{v}(t)\}_{v\in\mathcal{V}}$ . A straightforward decentralized implement of Algorithm 1 then is a multi-centralized solution. In this solution, agents transmit the feasible policy sets across the entire network until each agent knows the whole policy set $\mathcal{P}^{i},\,\,\forall i\in\mathcal{A}$ (flooding approach). Then, each agent acts as a central node and runs a copy of Algorithm 1 locally. Although reasonable for small-size networks, the communication and storage costs of this approach scale poorly with the network size. The sequential structure of Algorithm 1 however, offers an opportunity for a communicationally and computationally more efficient decentralized implementations, as described in steps 1 to 9 of Algorithm 2. Step 10 of Algorithm 2 is included for receding horizon implementation purpose, where the execution plan can be for example one or all of the agents visit at least one node. To implement Algorithm 2, we assume that the agents $\mathcal{A}$ can form a bidirectional connected communication graph $\mathcal{G}^{a}=(\mathcal{A},\mathcal{E}^{a})$ , i.e., there is a path from every agent to every other agent on $\mathcal{G}^{a}$ . Then, there always exists a route $\mathtt{SEQ}=\mathtt{s}_{1}\to\cdots\to\mathtt{s}_{i}\to\cdots\to\mathtt{s}_{K}$ , $\mathtt{s}_{k}\in\mathcal{A}$ , $k\in\{1,\cdots,K\}$ , $K\!\geq\!M$ , that visits all the agents (not necessarily only one time), see Fig. 3(a). The agents follow $\mathtt{SEQ}$ to share their information while implementing Algorithm 2.

The communication cost to execute Algorithm 2 can be optimized by picking $\mathtt{SEQ}$ to be the shortest path [34] that visits all the agents over graph $\mathcal{G}^{a}$ . If $\mathcal{G}^{a}$ has a Hamiltonian path, the optimal choice for $\mathtt{SEQ}$ is a Hamiltonian path. Recall that a Hamiltonian path is a path that visits every agent on $\mathcal{G}^{a}$ only once [FR:74]. When, there is a $\mathtt{SEQ}$ that visits every agent on $\mathcal{G}^{a}$ , the directed information graph $\mathcal{G}^{I}=(\mathcal{A},\mathcal{E}^{I})$ of Algorithm 2, which shows the information access of each agent while implementing Algorithm 2, is full, see Fig. 3. That is, each agent in $\mathtt{SEQ}$ is aware of the previous agents’ decision. Therefore, the solution obtained by Algorithm 2 is an exact sequential greedy algorithm and its optimality gap is $1/2$ . We recall that the labeling order of the mobile agents does not have an effect on the optimality gap guaranteed by Theorem III.2 [35]. If an agent $i\in\mathcal{A}$ appears repeatedly in $\mathtt{SEQ}$ (e.g., the blue agent in Fig. 3), with a slight increase in computation cost, we can modify Algorithm 2 to allow agent $i$ to redesign and improve its sub-optimal policy $\mathsf{p}^{i\star}$ by re-executing step 4 of Algorithm 2.

Another form of decentralized implementation of Algorithm 1, which may be more relevant in urban environments, is through a client-server framework implemented over a cloud. In this framework, agents (clients) connect to shared memory on a cloud (server) to download or upload information or use the cloud’s computing power asynchronously. Let $\{\mathcal{T}^{i}\},\,i\in\mathcal{A}$ , be the set of disjoint time slots that is allotted respectively to agents $\mathcal{A}$ , see Fig. 4. To implement Algorithm 1, agent $i\in\mathcal{A}$ connects to the server at the beginning of $\mathcal{T}^{i}$ to check out $\bar{\mathcal{P}}$ and $\{\bar{\mathfrak{t}}_{v}^{0}\}_{v\in\mathcal{V}}$ . Then, it completes steps $4$ and $5$ of Algorithm 1, and checks in the updated $\bar{\mathcal{P}}$ to the server before $\mathcal{T}^{i}$ elapses fully. The last agent based on the execution plan of the receding horizon operation updates $\{\bar{\mathfrak{t}}_{v}^{0}\}_{v\in\mathcal{V}}$ and checks it in the cloud memory for next receding horizon planning. Since the time slots assigned to the agents do not overlap, agent $i$ has access to policy $\mathsf{p}^{k\star}$ of all agents $k$ which has already communicated to the cloud. Thus, the information graph $\mathcal{G}^{I}$ is full, and the optimality gap of $1/2$ holds.

If there is a message dropout while executing Algorithm 2 or in the decentralized server-client based operation an agent $j$ takes a longer time than $\mathcal{T}^{j}$ to complete and check-in $\bar{\mathcal{P}}$ to the cloud, the information graph becomes incomplete, see for example Fig. 4. Then, the corresponding decentralized implementation deviates from the exact sequential greedy Algorithm 2. For such cases, [35] shows that the optimality gap instead of $1/2$ becomes $\frac{1}{M-\omega(\mathcal{G}^{I})+2}$ , where $\omega(\mathcal{G}^{I})$ is the clique number of $\mathcal{G}^{I}$ [35]. Recall that the clique number of a graph is equal to the number of the nodes in the largest sub-graph such that adding an edge will cause a cycle [36].

IV Numerical Example

We consider persistent monitoring using $3$ agents for event detection over an area that is divided into $20$ by $20$ grid map as shown in Fig. 5(a). The geographical nodes of interest $\mathcal{V}$ are the center of the cells in Fig. 5(a). The agents can travel from a cell to the neighboring cells in the right, left, bottom, and top. The agents are homogeneous and the travel time between any neighboring nodes for all the agents are identical and equal to $1$ second. The agents start their patrolling task from the nodes where they are depicted in Fig. 5(a). We model the event occurrence in each geographical node as a Poisson process and define our reward function at each node $v\in\mathcal{V}$ as (1) with $\psi_{v}(t)=1-\text{e}^{\lambda_{v}t}$ where $\lambda_{v}\in_{>0}$ is the arrival rate of the event; for more details see [29]. Fig. 5(a) shows the reward value of the nodes at $t=120$ seconds when there is no monitoring. The color intensity of the cells in Fig. 5(a) is proportional to $\lambda_{v}$ ; the higher $\lambda_{v}$ , the darker the color of node $v$ . The region enclosed by the blue rectangle initially has a low reward but after $100$ seconds its reward value is increased to a higher value by changing $\lambda_{v}$ of the corresponding cells. An animated depiction of the change in the reward map because of different dispatch policies we discuss below is available in [37]. We compare the performance of Algorithm 1, implemented in a receding horizon fashion, and a conventional greedy algorithm where each agent always moves to the neighboring node that has the instantaneous highest reward value. In implementing Algorithm 1 in a receding horizon fashion, we assume that the planning horizon is $4$ seconds and the execution horizon is $1$ second. We consider both the case of including ( $\alpha=0.1$ ) and excluding ( $\alpha=0$ ) the nodal importance measure in the reward function (5). Fig. 5(b) shows that the traditional greedy cell selection performs poorly compared to the other two planning algorithms. The reason is that the three agents’ decision becomes the same after a while, i.e., they start choosing the same cell after a while and moving together, therefore all three agents act as if one agent is patrolling (recall Assumption 1). The performance of Algorithm 1 is better than a standard greedy cell selection because the effect of agent $i$ ’s patrolling policy is taken into account when agent $i+1$ is designed. Therefore, the chances that all three agents go to the same cell together and move together is narrow. Furthermore, we can note that implementing Algorithm 1 by considering the effect of nodal importance delivers a better outcome. The reason is that in the case that there is no nodal importance, the agents are drawn to the region of high importance near them and stay there as Fig. 5(c) shows. However, there are other important regions with higher values that are farther away, especially the area on the left top corner which is separated by a low rate stripe from where agents start. Incorporating nodal importance, as Fig. 5(d) shows steers the agents to the regions with a higher rate of reward that are beyond the receding horizon’s sight.

We presented a multi-agent dispatch policy design for persistent monitoring of a set of finite inter-connected geographical nodes. Our design relied on assigning an increasing and concave reward function of time to each node that reset to zero after a visit by an agent. We defined our design utility function as the sum of the rewards scored for the team when agents visit the geographical nodes. By showing that the utility function is a monotone increasing and submodular set function, we laid the ground to propose a suboptimal solution with a known optimality gap for our dispatch policy design, which was NP-hard. To induce robustness to the changes in the problem parameters, we proposed our suboptimal solution in a receding horizon setting. Next, to compensate for the shortsightedness of the receding horizon approach, we added a new term, called the relative nodal importance, to the utility function as a measure to incorporate a notion of the importance of the regions beyond the feasible solution set of the receding horizon optimization problem. Our numerical example demonstrated the benefit of introducing this term. Lastly, we discussed how our suboptimal solution can be implemented in a decentralized manner. Our future work is to investigate decentralized algorithms that allow agents to communicate synchronously with each other in order to have a consensus on a policy with a known optimality gap.

Appendix

[Proof of Lemma II.1] The time complexity of constructing the admissible policy set $\mathcal{P}^{i}$ is of order of the number of possible paths that agent $i\in\mathcal{A}$ can traverse over the mission horizon while respecting Assumption 2, which is of order $\mathsf{D}^{\bar{\mathsf{n}}^{i}}$ . Thus, the time complexity of constructing the feasible set $\mathcal{P}\!=\!\bigcup_{i\in\mathcal{A}}\mathcal{P}^{i}$ is $O(\sum_{i\in\mathcal{A}}\mathsf{D}^{\bar{\mathsf{n}}^{i}})$ . Next, let $\bar{\mathcal{P}}$ be any subset of $\mathcal{P}$ that satisfies constraint (3b). Due to Assumption 1, the reward scored by implementing policy ${\mathsf{p}}\!=\!(\boldsymbol{\mathbf{\mathsf{V}}}_{\mathsf{p}},\boldsymbol{\mathbf{\mathsf{T}}}_{\mathsf{p}},\mathsf{a}_{\mathsf{p}})\in\bar{\mathcal{P}}$ cannot be calculated independent from the all the other policies in $\bar{\mathcal{P}}\backslash\{\mathsf{p}\}$ . Hence, to solve optimization problem (3a), we need to evaluate all the possible policy sets $\bar{\mathcal{P}}$ satisfying the constraint (3b). Since $\bar{\mathcal{P}}$ can have at most one policy from the policy set $\mathcal{P}^{i}$ of $i\!\in\!\mathcal{A}$ and $\mathcal{P}^{i}$ has $O(\sum_{i\in\mathcal{A}}\mathsf{D}^{\bar{\mathsf{n}}^{i}})$ members, then $O(\prod_{i=1}^{M}\mathsf{D}^{\bar{\mathsf{n}}^{i}})$ different possibilities of $\bar{\mathcal{P}}$ exist which determines the time complexity of solving (3a). $\Box$

We develop the auxiliary results below to use in the proof of Theorem III.1. These results show some of the properties of the sum of evaluation of a concave and increasing function over increasing sequences and their concatenation. The decreasing sequence $(\delta\mathfrak{t})_{1}^{n}$ majorizes the decreasing sequence $(\delta\mathfrak{v})_{1}^{n}$ , if

[TABLE]

and

[TABLE]

hold.

Lemma A.1

Let $f:\to$ be a concave and increasing function with $f(0)=0$ . If sequences $(\delta\mathfrak{t})_{1}^{n}$ and $(\delta\mathfrak{v})_{1}^{m}$ with $n\leq m$ satisfy $\delta\mathfrak{t}_{1}+\cdots+\delta\mathfrak{t}_{i}\geq\delta\mathfrak{v}_{1}+\cdots+\delta\mathfrak{v}_{i},\quad\forall i\in\{1,\cdots,n-1\}$ and $\delta\mathfrak{t}_{1}+\cdots+\delta\mathfrak{t}_{n}=\delta\mathfrak{v}_{1}+\cdots+\delta\mathfrak{v}_{m}$ then

[TABLE]

holds.

Proof:

We note that the sequence $(\delta\mathfrak{u})_{1}^{m}$ defined as $\delta\mathfrak{u}_{i}=\delta\mathfrak{u}_{i}$ for $i\in\{1,\cdots,n\}$ and $\delta\mathfrak{u}_{i}=0$ for $i\in\{n+1,\cdots,m\}$ majorizes any sequence $(\delta\mathfrak{v})_{1}^{m}$ defined in the lemma statement. Then, since $f(0)=0$ , the proof follows from the Karamata’s inequality [23]. ∎

Corollary IV.1

Let $f:{\mathbb{R}}_{\geq 0}\to{\mathbb{R}}_{\geq 0}$ be a monotone increasing and concave function. Then for any $a,b,c,d\in_{\geq 0}$ such that $0\leq a\leq c$ and $0\leq b\leq d$ , then

[TABLE]

holds.

Proof:

The assumption is that $a\leq c$ and $b\leq d$ . Therefor, we have $a+b\leq c+d$ . By taking $a,b,c+d$ and $c,d,a+b$ as $\delta\mathfrak{t}$ ’s $\delta\mathfrak{v}$ ’s respectively. There will be two possible cases for $\delta\mathfrak{t}$ ’s as

[TABLE]

and there will be six possible cases for $\delta\mathfrak{v}$ ’s as

[TABLE]

Taking any cases of $A$ or $B$ , we have $\delta\mathfrak{t}_{1}+\delta\mathfrak{t}_{2}+\delta\mathfrak{t}_{3}=\delta\mathfrak{v}_{1}+\delta\mathfrak{v}_{2}+\delta\mathfrak{v}_{3}=a+b+c+d$ . Comparing any cases of $A$ with any cases of $B$ , $\delta\mathfrak{t}_{1}\geq\delta\mathfrak{v}_{1}$ . Taking case $(A1)$ , since $a>b$ then we have $c+d+a\geq a+b+d$ and $c+d+a\geq a+b+c$ and also simply we have $c+d+a\geq c+d$ . Therefor, Taking case $A1$ and comparing with any cases of $B$ , we have $\delta\mathfrak{t}_{1}+\delta\mathfrak{t}_{2}\geq\delta\mathfrak{v}_{1}+\delta\mathfrak{v}_{2}$ . The same reasoning also can be done for case $A2$ . Hence taking any cases of $A$ and $B$ , we know that $\delta\mathfrak{t}_{1},\delta\mathfrak{t}_{2},\delta\mathfrak{t}_{3}$ majorizes $\delta\mathfrak{v}_{1},\delta\mathfrak{v}_{2},\delta\mathfrak{v}_{3}$ . This results in

[TABLE]

and consequently

[TABLE]

∎

Lemma A.2

For any $(\mathfrak{q})_{1}^{l}$ , let

[TABLE]

where $\Delta\mathfrak{q}_{i}=\mathfrak{q}_{i+1}-\mathfrak{q}_{i}$ and $f$ be a concave and increasing function with $f(0)=0$ . Now, consider two increasing sequences $(\mathfrak{t})^{n}_{1}$ and $(\mathfrak{u})_{1}^{l}$ , and their concatenation $(\mathfrak{a})_{1}^{n+l}=(\mathfrak{t})^{n}_{1}\oplus(\mathfrak{u})_{1}^{l}$ . Then,

[TABLE]

holds

Proof:

If $\mathfrak{a}_{p}\!=\!t_{1}$ and $\mathfrak{a}_{q}\!=\!t_{n}$ , then since $(\mathfrak{a})_{1}^{n+l}$ is a increasing sequence, $p\!<\!q$ . Let the sub-sequence of $(\mathfrak{a})_{1}^{n+l}$ ranging from index $p$ to $q$ be $(\mathfrak{v})_{1}^{m}$ where $m\geq n$ . Letting $\Delta\mathfrak{v}_{i}=\mathfrak{v}_{i+1}-\mathfrak{v}_{i}$ and $\Delta\mathfrak{t}_{i}=\mathfrak{t}_{i+1}-\mathfrak{t}_{i}$ , we rearrange $\Delta\mathfrak{v}_{i}$ ’s and $\Delta\mathfrak{t}_{i}$ ’s in a descending order to form the sequences $(\delta\mathfrak{v})_{1}^{l-1}$ and $(\delta\mathfrak{t})_{1}^{n-1}$ . Since $\mathfrak{a}_{p}=\mathfrak{t}_{1}$ and $\mathfrak{a}_{q}=\mathfrak{t}_{n}$ , we have

[TABLE]

Because $(\mathfrak{a})_{1}^{n+l}=(\mathfrak{t})^{n}_{1}\oplus(\mathfrak{u})_{1}^{l}$ , then $\forall i\in\{1,\cdots,n\}$ there exists $\mathsf{S}_{i}\!\subset\!\{1,\cdots,m\}$ such that $\sum\nolimits_{j\in\mathsf{S}_{i}}\delta\mathfrak{v}_{j}=\delta\mathfrak{t}_{i}$ , where $\mathsf{S}_{i}\cap\mathsf{S}_{k}=\emptyset,\,\,i\not=k$ . Consequently, for $r\!\in\!\{1,\cdots,m\}$ , we have $\sum\nolimits_{i=1}^{r}\delta\mathfrak{v}_{i}\!=\!\sum\nolimits_{j\in\mathsf{S}}\delta\mathfrak{t}_{j}$ for $\mathsf{S}\!\subset\!\{1,\cdots,n\}$ and $|S|\leq r$ . Since $(\delta\mathfrak{t})_{1}^{n-1}$ is a decreasing sequence, we can write

[TABLE]

Thus,

[TABLE]

holds as a result of Lemma A.1. Given that

[TABLE]

and

[TABLE]

then

[TABLE]

which concludes the proof. ∎

Lemma A.3

For any $(\mathfrak{q})_{1}^{l}$ , let

[TABLE]

where $\Delta\mathfrak{q}_{i}=\mathfrak{q}_{i+1}-\mathfrak{q}_{i}$ and $f$ is a concave and increasing function with $f(0)=0$ . Now, consider three increasing sequences $(\mathfrak{t})^{n}_{1}$ and $(\mathfrak{v})^{m}_{1}$ and $(\mathfrak{u})_{1}^{l}$ and concatenations $(\mathfrak{a})_{1}^{n+l}=(\mathfrak{t})^{n}_{1}\oplus(\mathfrak{u})_{1}^{l}$ and $(\mathfrak{b})_{1}^{m+l}=(\mathfrak{v})^{m}_{1}\oplus(\mathfrak{u})_{1}^{l}$ where $(\mathfrak{v})^{m}_{1}$ is a sub-sequence of $(\mathfrak{t})^{n}_{1}$ , then

[TABLE]

Proof:

Let the sequence $(\mathfrak{u})_{1}^{p}$ be the first $p$ elements of $(\mathfrak{u})_{1}^{l}$ . Then, we can form

[TABLE]

where $(\mathfrak{u})_{1}^{0}$ to be an empty sequence with no members. Since $(\mathfrak{v})^{m}_{1}$ is a sub-sequence of $(\mathfrak{t})^{n}_{1}$ and $(\mathfrak{u})_{1}^{p}$ having one member more over $(\mathfrak{u})_{1}^{p-1}$ , then we have

[TABLE]

with $0\leq\Delta S_{3}\leq\Delta S_{1}$ and $0\leq\Delta S_{4}\leq\Delta S_{2}$ . From Corollary IV.1, we can conclude that $\Delta S_{p}\geq 0$ . Then, given

[TABLE]

the proof is concluded. ∎

Bibliography37

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J. Cortes, S. Martinez, T. Karatas, and F. Bullo, “Coverage control for mobile sensing networks,” IEEE Tran. on Automatic Control , vol. 20, no. 2, pp. 243–255, 2004.
2[2] A. Krause and C. Guestrin, “Near-optimal observation selection using submodular functions,” in American Association for Artificial Intelligence , vol. 7, pp. 1650–1654, 2007.
3[3] A. Krause, A. Singh, and C. Guestrin, “Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies,” Journal of Machine Learning Research , vol. 9, no. Feb, pp. 235–284, 2008.
4[4] M. Schwager, D. Rus, and J. Slotine, “Decentralized, adaptive coverage control for networked robots,” The Int. Journal of Robotics Research , vol. 28, no. 3, pp. 357–375, 2009.
5[5] F. Bullo, R. Carli, and P. Frasca, “Gossip coverage control for robotic networks: Dynamical systems on the space of partitions,” SIAM Journal on Control and Optimization , vol. 50, no. 1, pp. 419–447, 2012.
6[6] A. Carron, M. Todescato, R. Carli, L. Schenato, and G. Pillonetto, “Multi-agents adaptive estimation and coverage control using gaussian regression,” in 2015 European Control Conference (ECC) , pp. 2490–2495, IEEE, 2015.
7[7] M. Todescato, A. Carron, R. Carli, G. Pillonetto, and L. Schenato, “Multi-robots gaussian estimation and coverage control: From client–server to peer-to-peer architectures,” Automatica , vol. 80, pp. 284–294, 2017.
8[8] Y. Chung and S. S. Kia, “A distributed service-matching coverage via heterogeneous mobile agents,” 2020. available at https://arxiv.org/pdf/2009.11943.pdf .

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A sub-modular receding horizon solution for

Abstract

I Introduction

II Problem Formulation

Assumption 1

Assumption 2

Lemma II.1** (Time complexity of problem (3a))**

III Suboptimal policy design

Theorem III.1** (Submodularity of the reward function (5))**

Proof:

Theorem III.2** (Optimality gap of Algorithm 1)**

Proof:

III-A Comments on decentralized implementations of Algorithm 1

IV Numerical Example

Appendix

Lemma A.1

Proof:

Corollary IV.1

Proof:

Lemma A.2

Proof:

Lemma A.3

Proof:

Lemma II.1 (Time complexity of problem (3a))

Theorem III.1 (Submodularity of the reward function (5))

Theorem III.2 (Optimality gap of Algorithm 1)