Massive Autonomous UAV Path Planning: A Neural Network Based Mean-Field Game Theoretic Approach
Hamid Shiri, Jihong Park, Mehdi Bennis

TL;DR
This paper presents a neural network-based mean-field game approach for autonomous UAV path planning that reduces communication and computation energy while ensuring collision avoidance in large UAV swarms.
Contribution
It introduces a novel ML-assisted MFG control method that efficiently solves PDEs for large UAV groups with minimal communication and computational costs.
Findings
Effective collision avoidance demonstrated in simulations.
Reduced communication energy by exchanging UAV states only once.
Low computational energy achieved through ML approximation of PDE solutions.
Abstract
This paper investigates the autonomous control of massive unmanned aerial vehicles (UAVs) for mission-critical applications (e.g., dispatching many UAVs from a source to a destination for firefighting). Achieving their fast travel and low motion energy without inter-UAV collision under wind perturbation is a daunting control task, which incurs huge communication energy for exchanging UAV states in real time. We tackle this problem by exploiting a mean-field game (MFG) theoretic control method that requires the UAV state exchanges only once at the initial source. Afterwards, each UAV can control its acceleration by locally solving two partial differential equations (PDEs), known as the Hamilton-Jacobi-Bellman (HJB) and Fokker-Planck-Kolmogorov (FPK) equations. This approach, however, brings about huge computation energy for solving the PDEs, particularly under multi-dimensional UAV…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Massive Autonomous UAV Path Planning:
A Neural Network Based Mean-Field Game Theoretic Approach
Hamid Shiri, Jihong Park, and Mehdi Bennis
Centre for Wireless Communications, University of Oulu, Finland, Email: {hamid.shiri, jihong.park, mehdi.bennis}@oulu.fi
Abstract
This paper investigates the autonomous control of massive unmanned aerial vehicles (UAVs) for mission-critical applications (e.g., dispatching many UAVs from a source to a destination for firefighting). Achieving their fast travel and low motion energy without inter-UAV collision under wind perturbation is a daunting control task, which incurs huge communication energy for exchanging UAV states in real time. We tackle this problem by exploiting a mean-field game (MFG) theoretic control method that requires the UAV state exchanges only once at the initial source. Afterwards, each UAV can control its acceleration by locally solving two partial differential equations (PDEs), known as the Hamilton-Jacobi-Bellman (HJB) and Fokker-Planck-Kolmogorov (FPK) equations. This approach, however, brings about huge computation energy for solving the PDEs, particularly under multi-dimensional UAV states. We address this issue by utilizing a machine learning (ML) method where two separate ML models approximate the solutions of the HJB and FPK equations. These ML models are trained and exploited using an online gradient descent method with low computational complexity. Numerical evaluations validate that the proposed ML aided MFG theoretic algorithm, referred to as MFG learning control, is effective in collision avoidance with low communication energy and acceptable computation energy.
Index Terms:
Autonomous UAV, communication-efficient online path planning, mean-field game, machine learning.
I Introduction
Many unmanned aerial vehicles (UAVs) are essential in mission-critical applications, for covering wide disaster sites in emergency cellular networks [1] and for delivering heavy payload in rescue mission and firefighting scenarios [2, 3]. These applications are real-time, and do not tolerate remote control delays from a central controller. Besides, they necessitate reliable control under uncertainty such as wind perturbations, making pre-programed offline control algorithms ill-suited. In view of this, in this paper we focus on the problem of controlling a large number of UAVs in a distributed and online way, so as to achieve 1) the fastest travel from a source to a destination, while jointly minimizing 2) motion energy, and 3) inter-UAV collision, under wind dynamics.
This problem is challenging as illustrated in Fig. 1, wherein each UAV is faced with making control decisions with many degrees of freedom, while taking into account energy-saving and collision-avoidance. For collision avoidance, multiple UAVs need to interact with each other, which require inter-UAV communications whose delay and/or energy cost increases exponentially with the number of UAVs. Such communication and control overhead is persistent as the control must be continual under wind perturbations.
To address the aforementioned issues, we leverage mean-field game (MFG) theory [4, 5], a mathematical framework that is effective in reducing the communication and control overhead of distributed control under agent interactions (e.g., collisions) through their states (e.g., locations) [1]. At its core, MFG considers a large number of agents, each of which approximately views the other agents’ states as the global state averaged across all agents. The global state is identically given for all agents at any given time, and one can thus focus only on controlling a single agent while incorporating its interactions via the global state distribution, referred to as the mean-field (MF) distribution.
The said MFG theoretic control is operated by locally solving two partial differential equations (PDEs) at each agent. Namely, a single agent computes the MF distribution by solving the Fokker-Plank-Kolmogorov (FPK) equation, so long as the initial global state is known by exchanging agents’ states only once. For the given MF distribution, the optimal control of the agent is determined by solving the other PDE induced by a continuous-time Markov decision problem (MDP), known as the Hamilton-Jacobi-Bellman (HJB) equation [5].
While effective, MFG theoretic approaches are computationally expensive due to solving both HJB and FPK equations, particularly with multi-dimensional states [6], limiting their adoption for real-time multi-dimensional control applications. To circumvent this problem, we propose an MFG learning control algorithm in which the HJB and FPK solutions are approximated using two separate machine learning (ML) models (e.g., neural networks), denoted as HJB model and FPK model, respectively. The HJB and FPK models stored at each UAV are simultaneously trained and exploited for control in an online manner. Numerical evaluations validate that the proposed MFG learning control more reliably guarantees collision avoidance with significant communication energy reduction, at the cost of a slight increase in computation and motion energy consumption.
Related works. The problem of UAV placement for supporting communication systems has been studied in [7]. Under wind perturbations, the real-time placement of massive UAVs without collision has been investigated in [1]. Path planning is a more challenging problem wherein UAVs are controlled to reach a destination. In offline control, a multiple-UAV scenario have been addressed in [7]. In online control, an evolutionary algorithm [8] and a partially observable Markov decision process based method [9] have been proposed. For other communication and control related issues for UAV systems, readers are encourage to check [10] and [3], respectively.
II System model
We assume a set of UAVs traveling from a common source to a destination in a two-dimensional plane, where the origin is set as the destination. At time , the -th UAV controls its acceleration , so as to minimize its: 1) travel time, 2) motion energy, and 3) inter-UAV collision, during the remaining travel to the destination.
The control of is based not only on its local state , but also on the states of a set of the other UAVs within ’s communication range with for collision avoidance, and . The communication range is determined by the minimum received signal-to-noise ratio (SNR) required for successful decoding under the standard path loss model, which is given as with an identical transmission power , noise power , and path loss exponent .
The state of UAV is comprised of its location and velocity that are dynamically updated by the control under random wind dynamics. Following [11], the wind dynamics are assumed to follow an Ornstein-Uhlenbeck process with an average wind velocity . The temporal state dynamics are thereby given as:
[TABLE]
where is a positive constant, is the covariance matrix of the wind velocity, and is the standard Wiener process independently and identically distributed (i.i.d.) across UAVs.
To achieve the aforementioned goals 1), 2), and 3), UAV at time aims to minimize its average cost , where the average is taken with respect to the measure induced by all possible controls for . The cost consists of the term depending only on the local state and the term relying on the global state observed by , given as:
[TABLE]
where
[TABLE]
and the terms , , , , and are positive constants.
The local term in (3) focuses on the following two objectives. For 1) travel time minimization, it is intended to minimize the remaining travel distance , while maximizing the velocity towards the destination, i.e., minimizing the projected velocity towards the opposite direction to the destination. For 2) motion energy minimization, it is planned to minimize the kinetic energy and the acceleration control energy that are proportional to and , respectively [12, 13].
The global term in (3) refers to 3) collision avoidance, and is intended to form a flock of UAVs moving together [14]. The flocking leads to small relative inter-UAV velocities for avoiding collision even when their controlled velocities are slightly perturbed by wind dynamics. Furthermore, the flocking yields closer inter-UAV distances without collision. This is beneficial for allowing more UAVs to exchange their states, i.e., larger , thereby contributing also to collision avoidance. In view of this, we adopt the Cucker-Smale flocking [1, 14] that reduces the relative velocities for the UAVs. The relative velocity and the inter-UAV distance are thus incorporated in the numerator and denominator of , respectively.
Incorporating the cost function (3) under its temporal dynamics (1) and (2), the control problem of UAV at time is formulated as:
[TABLE]
where , , , and denotes the two-dimensional identity matrix. The minimum cost is referred to as the value function of the optimal control, and is derived using two different control methods in the next section.
III HJB Control and MFG Control
Deriving the UAV ’s value function in (4) is intertwined with other UAVs, through the collision avoidance term in (3). Therefore, this is an -player non-cooperative game whose well-known solution is the Nash equilibrium (NE), i.e., the control decisions under which no UAV can unilaterally decrease its cost [5]. Its solution complexity exponentially increases with , which is a poor fit for real-time applications. To address this pressing concern, in this section we consider two different control methods: 1) HJB control, our baseline method in which each UAV’s control only takes into account the other UAVs’ states before taking their actions; and 2) MFG control, our proposed method that incorporates the intertwined controls via an approximated global state distribution, i.e., the MF distribution.
It is noted that HJB control does not always achieve the NE as it intentionally neglects the actual control interactions, i.e., the states when taking actions. On the other hand, MFG control relies on the MF approximation, and only achieves the NE asymptotically when [5]. The operational details of both control schemes are elaborated in the following subsections, and their effectiveness under a large finite number of UAVs will be numerically examined in Sec. V.
III-A HJB Control
The UAV ’s value function in (4) is equivalent to the solution of its corresponding HJB \mathsf{H}\big{(}\psi_{i}^{*}\!(t);s_{N_{i}}\!(t)\big{)}\!=\!0 formulated according to the Markov decision principle. The left-hand side \mathsf{H}\big{(}\psi_{i}^{*}\!(t);s_{N_{i}}\!(t)\big{)} is given by putting into \mathsf{H}\big{(}\psi_{i}\!(t);s_{N_{i}}\!(t)\big{)} in (7) at the bottom of the next page (see the derivation details in [5]). Due to the global term therein for collision avoidance, the HJB solution requires collecting the other UAVs’ states. Furthermore, achieving the NE of -UAV controls, necessitates solving -coupled HJBs whose required number of state exchanges exponentially increases with . For example, each HJB is first solved while the other UAVs’ states are fixed, and this should be iterated for UAVs in a recursive manner until all action changes stop, i.e., convergence to the NE [5]. The said -coupled HJB solutions require state exchanges per time instant , where denotes the number of iterations until convergence to the NE.
Such excessive communication overhead is not bearable for real-time UAV controls. Therefore, while compromising convergence to the NE, as a baseline control scheme we instead consider HJB control of UAVs that exchange number of states before solving the HJBs, i.e., before taking actions, at each time instant . Afterwards, each HJB is solved independently without recursion, as visualized in Fig. 2-a. At time , ’s HJB control is summarized as below.
Algorithm 1. HJB Control
Collect the states from UAVs.
Calculate the value by solving the HJB \mathsf{H}\big{(}\psi_{i}^{*}\!(t);s_{N_{i}}\!(t)\big{)}=0 (see (7)).
Take the optimal action .
Here, (7) is derived by applying the optimal control to (6), where denotes the differential operator taken with respect to . The optimal control is obtained according to the Karush-Kuhn-Tucker (KKT) conditions, since the HJB’s Hamiltonian, i.e., the terms inside the infimum in (6), is convex with respect to . The existence of is ensured by the fact that the HJB with (6) has a unique solution according to [5], as long as the drift term in (5) and the instantaneous cost are smooth, i.e., continuous first derivatives.
III-B MFG Control
Compared to HJB control with state exchanges per time instant , MFG control requires state exchanges only at the initial time , while asymptotically guaranteeing the NE anytime as goes to infinity. This is viable by locally calculating the MF distribution that asymptotically converges to the (empirical) global state distribution when all actions are taken under the NE, i.e., . With finite UAVs, it yields an MF approximation that achieves the -NE [5].
To this end, each UAV under MFG control locally solves a pair of the HJB \mathsf{H}\big{(}\psi_{i}^{*}\!(t);s_{i}(t),m(t)\big{)}=0 (see (8) with ) and its coupled FPK \mathsf{F}\!\big{(}m(t);s_{i}(t),\psi_{i}^{*}\!(t)\big{)}=0 (see (10) with ) that is derived from the state dynamics (5) with the Itô’s lemma [5]. As illustrated in Fig. 2-b, solving the HJB produces the value (or its corresponding optimal action ), which is fed to the FPK whose solution is the MF distribution . This operation is locally iterated times until it converges to the NE. At time , ’s MFG control is described as follows.
Algorithm 2. MFG Control
For :
Calculate the value by solving the HJB \mathsf{H}\big{(}\psi_{i}^{[k]}\!(t);s_{i}(t),m^{[k-1]}\!(t)\big{)}=0 (see (8)).
Calculate the MF distribution by solving the FPK \mathsf{F}\big{(}m^{[k]}\!(t);s_{i}(t),\psi_{i}^{[k]}\!(t)\big{)}.
Iterate 1) and 2) until .
Take the optimal action .
Initial MF distribution at :
•
If , , computed by collecting the states from N UAVs.
•
Otherwise, , i.e., the converged MF distribution in the previous control where denotes the control interval.
It is noted that the HJB’s global term in (8) approximates in (7), where
[TABLE]
This MF approximation is based on treating each of the UAVs’ states as induced by the MF distribution . The approximation converges to the exact value as , so long as is bounded and UAV indices are permutable, i.e., the exchangeability of actions for the same states (see the condition details in [5]).
IV ML Aided HJB and MFG Controls
Both HJB and MFG controls are facilitated by the HJB and FPK equations. These PDEs are solved by discretizing the domain in a way that the derivatives therein can be approximated using finite differences. Unfortunately, such a finite difference method requires finer discretization as the domain dimension increases, incurring higher computational complexity. For instance, in a two-dimensional - domain, the convergence of a numerical PDE solution with the temporal discretization step size is guaranteed by the Courant-Friedrichs-Lewy (CFL) condition whose feasible step size is smaller than the required step size in a one-dimensional domain, i.e., [6].
To enable multi-dimensional control in real time with low computational complexity, we propose HJB learning control and MFG learning control that approximate both HJB control and FPK control in Sec. III, respectively. Via these methods, ML models learn to solve the HJB and FPK in an online way, as elaborated in the following subsections.
IV-A HJB Learning Control
HJB learning control exploits ML to enable and represent the baseline method, HJB control in Sec. III-A. The key idea is to approximate the problem of solving the HJB equation \mathsf{H}\big{(}\psi_{i}^{*}\!(t);s_{N_{i}}\!(t)\big{)}=0 by minimizing \mathsf{H}\big{(}\hat{\psi}_{i}\!(t);s_{N_{i}}\!(t)\big{)} via a data-driven regression method as proposed in [15]. To this end, a single hidden layer ML model, hereafter referred to as an HJB model, is constructed at the UAV . Its input is fed to hidden nodes with a given activation function , which are fully connected to the model output through a weight vector , i.e.,
[TABLE]
The model is trained by adjusting per each observation , so as to minimize its cost function comprising a loss function and a regularizer :
[TABLE]
where is a positive constant. The loss function is intended to minimize \hat{\mathsf{H}}\big{(}\hat{\psi}_{i}\!(t);s_{N_{i}}\!(t)\big{)} in (7). The regularizer is meant to stop the movement when reaching the destination, i.e., . At time , ’s HJB learning control is given as below.
Algorithm 3. HJB Learning Control
Collect the states from UAVs.
Update the weight as:
Calculate the value .
Take the optimal action .
The weight update rule in 2) of Algorithm 3 is derived by applying a normalized gradient descent algorithm (NGD), modified from the gradient descent algorithm (GD) in order to avoid saddle points under non-convex loss functions [16]. To be specific, the weight update rule under GD with the step size is , which is modified as under the original NGD [16] with . As opposed to this, the weight update rule in Algorithm 3 applies the sign operation only to the loss function in , in order not to disturb activations as detailed next.
The regularizer aims to ensure stably reaching the destination without further movement, i.e., the terminal zero-state convergence . With this, becomes activated, i.e., , for penalizing the loss function , when the state change direction (the sign of ) under the current control is the same as the current state direction (the sign of ), i.e., . Otherwise, the current control is capable of stabilizing the state, and the regularizer is thus inactivated, i.e., . The regularizer activations during UAV travels will be discussed in Sec. V.
In the loss function , the expression \hat{\mathsf{H}}\big{(}\hat{\psi}_{i}(t);s_{N_{i}}\!(t)\big{)} is derived by applying to (7) with the same procedure as described in Algorithm 1, except for the following detail. The cost function includes ; namely, within in (13) as well as that contains
[TABLE]
According to (5), this term introduces that is computationally intractable. Instead, following [15], we apply the nominal state dynamics without random wind perturbations when calculating .
IV-B MFG Control Learning
In a similar vein to Algorithm 3, MFG learning control exploits ML to approximate the solutions of the HJB \mathsf{H}\big{(}\psi_{i}^{*}\!(t);s_{i}(t),m(t)\big{)}\!=\!0 and the FPK \mathsf{F}\!\big{(}m(t);s_{i}(t),\psi_{i}^{*}\!(t)\big{)}\!\!=\!0 induced by MFG control in Sec. III-B as the minima of \mathsf{H}\big{(}\hat{\psi}_{i}\!(t);s_{i}(t),\hat{m}(t)\big{)} and \mathsf{F}\!\big{(}\hat{m}(t);s_{i}(t),\hat{\psi}_{i}\!(t)\big{)}. To this end, each UAV constructs two separate ML models: the HJB model used in Algorithm 3 and an FPK model, minimizing {\mathsf{H}}\big{(}\hat{\psi}_{i}\!(t);s_{i}(t),\hat{m}(t)\big{)} (see (8) with and ) and {\mathsf{F}}\!\big{(}\hat{m}(t);s_{i}(t),\hat{\psi}_{i}\!(t)\big{)} (see (10) with and ), respectively. The FPK model has the same structure with hidden nodes, and produces the approximated MF distribution by adjusting its weight vector , i.e.,
[TABLE]
Per each observation , the FPK model is trained by adjusting so as to minimize the cost function :
[TABLE]
The HJB model’s cost function is the same as (13), except for replacing its {\mathsf{H}}\big{(}\hat{\psi}_{i}(t);s_{N_{i}}\!(t)\big{)} with {\mathsf{H}}\big{(}\hat{\psi}_{i}\!(t);s_{i}(t),\hat{m}(t)\big{)}. At time , UAV ’s MFG learning control is described as Algorithm 4 on the next page.
Algorithm 4. MFG Learning Control
For :
Update the weight as:
Calculate the value .
Update the weight as:
Obtain the MF distribution
Iterate 1-4) until .
Take the optimal action .
Initial MF distribution at :
•
If , , computed by collecting the states from N UAVs.
•
Otherwise, .
V Numerical Results
In this section, we numerically compare the performances of HJB and MFG learning controls, in terms of travel time, energy consumption, and collision avoidance. For each travel, UAVs are dispatched to the origin from the source that is a square centered at in meters. At the source, each UAV is separated m away from each other (see Fig. 3-a), and its velocity is solely determined by the wind dynamics with and in m/s. Under MFG learning control, hereafter denoted as MFG, all UAVs are assumed to exchange their states at the source. Under HJB learning control, before every control, each UAV exchanges its state with the UAVs within the communication range meter, henceforth referred to as , without incurring interference via frequency division multiple access (FDMA).
For an HJB or MFG model, following [15], a single hidden layer model is constructed, wherein each hidden node’s activation function corresponds to each non-scalar term in a polynomial expansion. The polynomial is heuristically chosen as: for and for , where and . Compared to sigmoidal activations, polynomial activations enables smaller model sizes (i.e., , ), yet the models are known to be less robust against unseen state observations. Optimizing the model architecture is an interesting topic for future research. Other simulation parameters are summarized as follows: , , mW, dB, , , , , , , , and .
Figure 3 visualizes the trajectories of UAVs under , , and MFG. During the entire travel, UAVs under hardly communicate with each other. This makes their trajectories almost identical, causing frequent collision, where a collision is counted for an inter-UAV distance less than m. Focusing on , and MFG, at the beginning, all UAVs tend to follow the average wind direction to save motion energy, and then turn towards the destination. At this north-eastern turning point, fails to avoid collision due to its less trained HJB model. By contrast, MFG incurs no collision thanks to the locally iterated training operations between the HJB and FPK models (see iterations in Algorithm 4), yielding its more trained HJB (i.e., less variance in weight parameters), as observed in the rightmost subplot of Fig. 3-c. After the turning point, there is a long-distance flight of a UAV fleet. MFG shows the highest flight velocity owing to its better flocking, which partly compensates the longer travel distance for guaranteeing collision avoidance. Finally, at the last part of the travel, UAVs tend to hover around the destination in order to stop their movement while reaching the destination (i.e., ), which is detailed next.
Fig. 4 illustrates the accumulated number of regularizer activations (see the details in Sec. IV-A) in the HJB models of , , and MFG as time elapses. For all controls, is more frequently activated near the destination (i.e., s) so as to reduce the velocity, thereby avoiding excessive hovering around and/or passing by the destination. Note that a better flocking behavior (i.e., lower inter-UAV velocities without collision) enables a more stable control without the regularization. For this reason, MFG achieving the best flocking behavior shows the least number of activations. With more UAVs, MFG yields less frequent activations. This is because the MF approximation (see Sec. III-B) becomes more accurate as the number of UAVs increases, providing its better flocking behavior earlier.
Lastly, Fig. 5 compares the communication, computation, and motion energy consumptions of and MFG during the entire travel. Each energy is averaged over UAVs, and is normalized by the energy of . We consider that communication, computation, and motion energy consumptions are proportional to the number of state exchanges, the number of gradient calculations, and , respectively. Focusing on communication energy, MFG exchanges UAV states only once at the source, whereas does it for every observation. Therefore, MFG consumes significantly less energy, irrespective of the number of UAVs, as opposed to whose energy increases with the number of communicating UAVs. Next, motion energy is proportional to the travel distance. As MFG yields its longer travel distance for avoiding collision, it consumes more motion energy. For computation energy, it is also proportional to the travel distance under online learning. Besides, in contrast to having only an HJB model, MFG performs gradient calculations for both HJB and FPK models, which makes MFG consume more computation energy.
VI conclusion
To control massive autonomous UAVs, in this work we proposed MFG learning control algorithm that enables each UAV’s real-time acceleration control in a distributed manner, by training and exploiting HJB and FPK ML models in an online way. Our simulation validated that MFG learning control guarantees collision avoidance with low communication energy, at the cost of a slight increase in computation and motion energy, compared to a baseline scheme, HJB learning control. The effectiveness of MFG learning control hinges on the level of the HJB and FPK model training. Collaborative HJB and FPK model training across UAVs via federated learning frameworks [17] could thus be an interesting topic for future work.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] H. Kim, J. Park, M. Bennis, and S.-L. Kim, “Massive UAV-to-ground communication and its stable movement control: A mean-field approach,” in Proc. IEEE SPAWC, Kalamata, Greece , Jun. 2018.
- 2[2] E. Ackerman and E. Strickland, “Medical delivery drones take flight in east africa,” IEEE Spectrum , vol. 55, no. 1, pp. 34–35, Jan. 2018.
- 3[3] J. Tisdale, Z. Kim, and J. K. Hedrick, “Autonomous UAV path planning and estimation,” IEEE Robot. Autom. Mag. , vol. 16, no. 2, pp. 35–42, Jun. 2009.
- 4[4] M. Huang, P. E. Caines, and R. P. Malhamé, “Large-population cost-coupled LQG problems with nonuniform agents: individual-mass behavior and decentralized ε 𝜀 \varepsilon -Nash equilibria,” IEEE Trans. Autom. Control , vol. 52, no. 9, pp. 1560–1571, Sep. 2007.
- 5[5] J.-M. Lasry and P.-L. Lions, “Mean field games,” Japan. J. Math. , vol. 2, no. 1, pp. 229–260, Mar. 2007.
- 6[6] R. Courant, K. Friedrichs, and H. Lewy, “On the partial difference equations of mathematical physics,” IBM J. Res. Dev. , vol. 11, no. 2, pp. 215–234, Mar. 1967.
- 7[7] M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Efficient deployment of multiple unmanned aerial vehicles for optimal wireless coverage,” IEEE Commun. Lett. , vol. 20, no. 8, pp. 1647–1650, Aug. 2016.
- 8[8] I. K. Nikolos, K. P. Valavanis, N. C. Tsourveloudis, and A. N. Kostaras, “Evolutionary algorithm based offline/online path planner for UAV navigation,” IEEE Trans. Syst., Man, Cybern. B, Cybern. , vol. 33, no. 6, pp. 898–912, Dec. 2003.
