A Level Set Approach to Online Sensing and Trajectory Optimization with   Time Delays

Matthew R. Kirchner

arXiv:1901.11139·eess.SY·December 20, 2024

A Level Set Approach to Online Sensing and Trajectory Optimization with Time Delays

Matthew R. Kirchner

PDF

TL;DR

This paper introduces a level set method using the generalized Hopf formula for real-time trajectory optimization in robotics, effectively handling time delays in sensing and control, demonstrated through communication channel estimation.

Contribution

It presents a novel application of the generalized Hopf formula for online Hamilton-Jacobi equation solutions with time delays, including a new non-parametric approach for robotic communication channel estimation.

Findings

01

Efficient computation of Hamilton-Jacobi equations with time delays.

02

Successful online trajectory optimization with delay compensation.

03

Improved robotic communication channel estimation from incremental measurements.

Abstract

Presented is a method to compute certain classes of Hamilton-Jacobi equations that result from optimal control and trajectory generation problems with time delays. Many robotic control and trajectory problems have limited information of the operating environment a priori and must continually perform online trajectory optimization in real time after collecting measurements. The sensing and optimization can induce a significant time delay, and must be accounted for when computing the trajectory. This paper utilizes the generalized Hopf formula, which avoids the exponential dimensional scaling typical of other numerical methods for computing solutions to the Hamilton-Jacobi equation. We present as an example a robot that incrementally predicts a communication channel from measurements as it travels. As part of this example, we introduce a seemingly new generalization of a non-parametric…

Equations106

\frac{d}{d s} x (s) = A x (s) + B α (s), s \in [0, t],

\frac{d}{d s} x (s) = A x (s) + B α (s), s \in [0, t],

\frac{d}{d s} γ (s; x, α (\cdot))

\frac{d}{d s} γ (s; x, α (\cdot))

γ (0; x, α (\cdot))

R (t, x, α (\cdot)) = \int_{0}^{t} C (s, x, α (s)) d s + J (γ (t; x, α (\cdot))),

R (t, x, α (\cdot)) = \int_{0}^{t} C (s, x, α (s)) d s + J (γ (t; x, α (\cdot))),

v (x, t) = α (\cdot) \in A inf R (t, x, α (\cdot)) .

v (x, t) = α (\cdot) \in A inf R (t, x, α (\cdot)) .

{\frac{\partial φ}{\partial s} (x, s) + H (s, x, \nabla_{x} φ (x, s)) = 0, φ (x, 0) = J (x),

{\frac{\partial φ}{\partial s} (x, s) + H (s, x, \nabla_{x} φ (x, s)) = 0, φ (x, 0) = J (x),

H (s, x, p) = α \in R^{m} sup {⟨ - f (s, x, α), p ⟩ - C (s, x, α)} .

H (s, x, p) = α \in R^{m} sup {⟨ - f (s, x, α), p ⟩ - C (s, x, α)} .

\frac{d}{d s} λ (s; x, α (\cdot))

\frac{d}{d s} λ (s; x, α (\cdot))

λ (t; x, α (\cdot))

\frac{d}{d s} x (s) = f (α (s)) .

\frac{d}{d s} x (s) = f (α (s)) .

{\frac{\partial φ}{\partial s} (x, s) + H (\nabla_{x} φ (x, s)) = 0, φ (x, 0) = J (x) .

{\frac{\partial φ}{\partial s} (x, s) + H (\nabla_{x} φ (x, s)) = 0, φ (x, 0) = J (x) .

φ (x, t) = - p \in R^{n} min {J^{⋆} (p) + t H (p) - ⟨ x, p ⟩},

φ (x, t) = - p \in R^{n} min {J^{⋆} (p) + t H (p) - ⟨ x, p ⟩},

J^{⋆} (p) = x \in R^{n} sup {⟨ p, x ⟩ - J (x)} .

J^{⋆} (p) = x \in R^{n} sup {⟨ p, x ⟩ - J (x)} .

z (s) = e^{- s A} x (s),

z (s) = e^{- s A} x (s),

\frac{d}{d s} z (s) = f (s, α (s)) = e^{- s A} B α (s) .

\frac{d}{d s} z (s) = f (s, α (s)) = e^{- s A} B α (s) .

φ (z, 0) = J (e^{t A} z) .

φ (z, 0) = J (e^{t A} z) .

φ (x, t)

φ (x, t)

\displaystyle+\int_{0}^{t}\widehat{H}\left(s,p\right)ds-\left\langle x,p\right\rangle\Bigg{\}},

H (s, p) = α \in R^{m} sup {⟨ - e^{- s A} B α, p ⟩ - C (s, α)} .

H (s, p) = α \in R^{m} sup {⟨ - e^{- s A} B α, p ⟩ - C (s, α)} .

H (s, p) = α \in R^{m} sup {⟨ - e^{- s A} B α, p ⟩ - I_{A} (α)},

H (s, p) = α \in R^{m} sup {⟨ - e^{- s A} B α, p ⟩ - I_{A} (α)},

I_{A} (α) = {0 + \infty if α \in A otherwise.

I_{A} (α) = {0 + \infty if α \in A otherwise.

H (s, p) = - B^{⊤} e^{- s A^{⊤}} p_{A^{*}} .

H (s, p) = - B^{⊤} e^{- s A^{⊤}} p_{A^{*}} .

⎩ ⎨ ⎧ J (x) < 0 J (x) > 0 J (x) = 0 for any x \in int Ω, for any x \in (R^{n} ∖ Ω), for any x \in (Ω ∖ int Ω),

⎩ ⎨ ⎧ J (x) < 0 J (x) > 0 J (x) = 0 for any x \in int Ω, for any x \in (R^{n} ∖ Ω), for any x \in (Ω ∖ int Ω),

t_{i + 1} = t_{i} - \frac{φ ( x , t _{i} )}{\frac{\partial φ}{\partial t} ( x , t _{i} )},

t_{i + 1} = t_{i} - \frac{φ ( x , t _{i} )}{\frac{\partial φ}{\partial t} ( x , t _{i} )},

\frac{\partial φ}{\partial t} (x, t_{i}) = - H (\nabla_{x} φ (x, t_{i}), x),

\frac{\partial φ}{\partial t} (x, t_{i}) = - H (\nabla_{x} φ (x, t_{i}), x),

H (p^{*}, x) = - x^{⊤} A^{⊤} p^{*} + - B^{⊤} p^{*}_{*} .

H (p^{*}, x) = - x^{⊤} A^{⊤} p^{*} + - B^{⊤} p^{*}_{*} .

\frac{d}{d s} γ^{*} (s; x, α^{*} (\cdot))

\frac{d}{d s} γ^{*} (s; x, α^{*} (\cdot))

= A γ (s; x, α^{*} (\cdot))

+ B \nabla_{p} - B^{⊤} λ^{*} (s)_{*},

λ^{*} (s) = e^{- s A^{⊤}} p^{*} .

λ^{*} (s) = e^{- s A^{⊤}} p^{*} .

α^{*} (s) = \nabla_{p} - B^{⊤} e^{- s A^{⊤}} p^{*}_{*},

α^{*} (s) = \nabla_{p} - B^{⊤} e^{- s A^{⊤}} p^{*}_{*},

x^{k + 1} = γ^{*} (δ^{k}; x^{k}, α^{k} (\cdot)) .

x^{k + 1} = γ^{*} (δ^{k}; x^{k}, α^{k} (\cdot)) .

\frac{d}{d s} x (s) = A x (s) + B u^{k} (s - τ^{k}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A Level Set Approach to Online Sensing and Trajectory Optimization

with Time Delays

Matthew R. Kirchner

Image and Signal Processing Branch, Research Directorate, Naval Air Warfare Center Weapons Division, China Lake, CA 93555, USA (e-mail: [email protected])

Electrical and Computer Engineering Department, University of California, Santa Barbara, CA 93106, USA (e-mail: [email protected])

Abstract

Presented is a method to compute certain classes of Hamilton–Jacobi equations that result from optimal control and trajectory generation problems with time delays. Many robotic control and trajectory problems have limited information of the operating environment a priori and must continually perform online trajectory optimization in real time after collecting measurements. The sensing and optimization can induce a significant time delay, and must be accounted for when computing the trajectory. This paper utilizes the generalized Hopf formula, which avoids the exponential dimensional scaling typical of other numerical methods for computing solutions to the Hamilton–Jacobi equation. We present as an example a robot that incrementally predicts a communication channel from measurements as it travels. As part of this example, we introduce a seemingly new generalization of a non-parametric formulation of robotic communication channel estimation. New communication measurements are used to improve the channel estimate and online trajectory optimization with time-delay compensation is performed.

keywords:

Time Delay Systems, Hamilton–Jacobi Equation, Generalized Hopf Formula, Viscosity Solution, Optimal Control, Communication Seeking Robotics

††thanks: This research was supported by the Office of Naval Research under Grant N00014-18-WX01382.

1 Introduction

Time delays are common presence in real-world instantiations of dynamic systems. This causes issues with stability and robustness when attempting to design real-time control for these systems, and the challenges of design and analysis of systems with time delays have been well studied; see Richard (2003). Historically, there have been many attempts to compensate for time delays in control theory, such as the well-known Padé approximation in classical linear control theory (Franklin et al., 2006, Sec. 5.7.3), which approximates a pure delay as a rational transfer function or state transformations such as presented in Kwon and Pearson (1980). In a modern setting, real-time optimal control (RTOC) [Ross et al. (2006)] and model predictive control (MPC) [Camacho and Alba (2013)] have gained large scale acceptance in control and trajectory optimization problems. These seek to find a control sequence and the resulting trajectory that minimizes a pre-defined cost functional. RTOC optimizes the cost functional directly, while MPC computes on a finite time horizon and is frequently referred to as receding horizon control. Because of this, MPC is sub-optimal and necessitates online re-computation of the current state frequently, while RTOC only needs to be re-computed online if the system is perturbed from the computed optimal trajectory.

Fast online computation is critical for these approaches, and delay from computation becomes more apparent as the dimensionality and model complexity of the optimization increases. In addition to time delays induced from computation, many robotics problems have limited information of the operating environment a priori. The robot must perform sensing in real-time and as a result, necessitates online re-compuation of the optimal trajectory with this new information. As methods for sensing in robotics have become more sophisticated, the time delay induced becomes larger. An illustrative example appeared in Usman et al. (2016), where a complicated non-parametric model was used to estimate the quality of a communication signal from measurements of a transmitter with a known location. In that work, a RTOC scheme was developed to minimize combined motion and communication energy of the signal. The RTOC was re-computed every $10$ seconds and it was noted that the combined sensing and trajectory optimization was around $2$ seconds, or $20\%$ of the entire compute interval. A delay this large can have drastic consequences if not accounted for. This was not addressed in Usman et al. (2016).

As noted in Lu (2008), attempts to directly account for time delays in MPC formulations are limited, with the most notable proposed in Kwon et al. (2004) for linear systems. However this work is restricted to a specific LQR cost functional and does not account for control saturation. We present in this paper a more general approach based on Hamilton–Jacobi theory, which provides a natural way to deal with pure time delay in the control.

Generally, solutions to optimal control and trajectory problems can be found by solving a Hamilton–Jacobi (HJ) partial differential equation (PDE) as it establishes a sufficient condition for optimality [Osmolovskii (1998)]. Traditionally, numerical solutions to HJ equations require a dense, discrete grid of the solution space as in Osher and Fedkiw (2006); Mitchell (2008). Computing the elements of this grid scales poorly with dimension and has limited use for problems with dimension greater than four. The exponential dimensional scaling in optimization is sometimes referred to as the “curse of dimensionality” [Bellman (1957)].

A new result in Darbon and Osher (2016) discovered numerical solutions based on the Hopf formula [Hopf (1965)] that do not require a grid and can be used to efficiently compute solutions to a certain class of Hamilton–Jacobi PDEs. However, that only applied to systems with time-independent Hamiltonians of the form $\dot{x}=f\left(u\left(t\right)\right)$ , and has limited use for general linear control problems. Recently, the classes of systems was expanded upon and generalizations of the Hopf formula were used to solve optimal linear control problems in high-dimensions in Kirchner et al. (2018a) and differential games as applied to a multi-vehicle, collaborative pursuit-evasion problem in Kirchner et al. (2018b).

The HJ formulation of the trajectory problem allows a simple and direct treatment of these computationally-induced time delays and there is no need to resort to approximation schemes such as Padé. The main contribution of this paper is to generalize the Hopf formula to directly account for time delays induced by online computation of the optimal control and trajectory. Motivated by robotic vehicle path planning problems where the communication is sensed and estimated online, we present as an additional contribution a seemingly new non-parametric model to estimate a communication channel where the location of the transmitter is unknown a priori.

The rest of the paper is organized as follows. Section 2 reviews HJ theory as it relates to linear optimal control and presents a level set method, based on the generalized Hopf formula, for fast computation of optimal trajectories with known time delay. Section 4 presents an online trajectory problem where a robot incrementally predicts a wireless communication channel from measurements as it travels. As part of this example, we derive a non-parametric model for channel estimation. Finally, we present results from simulations of the method in Section 5.

2 Solutions to the Hamilton–Jacobi

Equation with the Hopf Formula

Before proceeding we introduce some notation and assumptions. We consider linear dynamics

[TABLE]

where $x\in\mathbb{R}^{n}$ is the system state and $\alpha\left(s\right)\in\mathcal{A}\subset\mathbb{R}^{m}$ is the control input, constrained to the convex admissible control set $\mathcal{A}$ . We let $\gamma\left(s;x,\alpha\left(\cdot\right)\right)\in\mathbb{R}^{n}$ denote a state trajectory that evolves in time, $s\in\left[0,t\right]$ , with control input sequence $\alpha\left(\cdot\right)\in\mathcal{A}$ according to $\left(\ref{eq:general dynamics}\right)$ starting from initial state $x$ at $s=0$ . The trajectory $\gamma$ is a solution of $\left(\ref{eq:general dynamics}\right)$ in that it satisfies $\left(\ref{eq:general dynamics}\right)$ almost everywhere:

[TABLE]

We construct a cost functional for $\gamma\left(s;x,\alpha\left(\cdot\right)\right)$ , given terminal time $t$ as

[TABLE]

where the function $C:\left(0,+\infty\right)\times\mathbb{R}^{n}\times\mathbb{R}^{m}\rightarrow\mathbb{R}\cup\left\{+\infty\right\}$ is the running cost and represents the rate that cost is accrued over time. The value function $v:\mathbb{R}^{n}\times\left(0,+\infty\right)\rightarrow\mathbb{R}$ is defined as the minimum cost, $R$ , among all admissible controls for a given state $x$ with

[TABLE]

The value function in $\left(\ref{eq: Value function}\right)$ satisfies the dynamic programming principle [Bryson and Ho (1975); Evans (2010)] and also satisfies the following initial value Hamilton–Jacobi (HJ) equation by defining the function $\varphi:\mathbb{R}^{n}\times\mathbb{R}\rightarrow\mathbb{R}$ as $\varphi\left(x,s\right)=v\left(x,t-s\right)$ , with $\varphi$ being the viscosity solution of

[TABLE]

where the Hamiltonian $H:\left(0,+\infty\right)\times\mathbb{R}^{n}\times\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\left\{+\infty\right\}$ is defined by

[TABLE]

The variable $p$ in* $\left(\ref{eq: Basic Hamiltonian definition}\right)$ *denotes the costate, which in the HJ equation $\left(\ref{eq:Initial value HJ PDE}\right)$ is associated with the gradient of the value function. We denote by $\lambda\left(s;x,\alpha\left(\cdot\right)\right)$ the costate trajectory that satisfies almost everywhere:

[TABLE]

for $\forall s\in\left[0,t\right]$ with initial costate denoted by $\lambda\left(0;x,\alpha\left(\cdot\right)\right)=p$ . With a slight abuse of notation, we will hereafter use $\lambda\left(s\right)$ to denote $\lambda\left(s;x,\alpha\left(\cdot\right)\right)$ , since the initial state and control sequence can be inferred through context with the corresponding state trajectory, $\gamma\left(s;x,\alpha\left(\cdot\right)\right)$ .

2.1 Viscosity Solutions with the Hopf Formula

Consider simplified system dynamics represented as

[TABLE]

The associated HJ equation no longer depends on state and is given as

[TABLE]

When $J\left(x\right)$ is convex and continuous in $x$ , and $H\left(p\right)$ is continuous in $p$ , it was shown in Evans (2010) that an exact, point-wise viscosity solution to $\left(\ref{eq:Non-state dependent HJ}\right)$ can be found using the Hopf formula [Hopf (1965)]

[TABLE]

with the Fenchel–Legendre transform, denoted $J^{\star}:\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\left\{+\infty\right\}$ , defined for a convex, proper, lower semicontinuous function $J:\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\left\{+\infty\right\}$ [Hiriart-Urruty and Lemaréchal (2012)] as

[TABLE]

The transform defined in $\left(\ref{eq: Fenchel transform}\right)$ is also referred to in literature as the convex conjugate.

Proceeding similar to Kirchner et al. (2018b), we can generalize the Hopf formula to $\left(\ref{eq:general dynamics}\right)$ by making a change of variables

[TABLE]

which results in the following system

[TABLE]

The terminal cost function is now defined in $z$ with

[TABLE]

For clarity in the sections to follow, we use the notation $\widehat{H}$ to refer to the Hamiltonian for systems defined by $\left(\ref{eq:z transformed system}\right)$ and $H$ for systems defined by $\left(\ref{eq:general linear system}\right)$ . Notice that the system $\left(\ref{eq:z transformed system}\right)$ does not depend on state but is now time-varying. It was shown in (Kurzhanski and Varaiya, 2014, Section 5.3.2, p. 215) that the Hopf formula can be generalized for a time-dependent Hamiltonian of the system in $\left(\ref{eq:z transformed system}\right)$ with

[TABLE]

with Hamiltonian defined by

[TABLE]

For the remainder of the paper, we consider a time-optimal formulation in which $\widehat{H}$ is defined as

[TABLE]

where $\mathcal{I}_{\mathcal{A}}:\mathbb{R}^{m}\rightarrow\mathbb{R}\cup\left\{+\infty\right\}$ is the indicator function for the set $\mathcal{A}$ and is defined by

[TABLE]

Suppose $\mathcal{A}$ is a closed convex set such that $0\in\text{int}\,\mathcal{A}$ , where $\text{int}\,\mathcal{A}$ denotes the interior of the set $\mathcal{A}$ . Then $\left(\mathcal{I}_{\mathcal{A}}\right)^{\star}$ defines a norm, which we denote with $\left\|\left(\cdot\right)\right\|_{\mathcal{A}^{*}}$ , which is the dual norm [Hiriart-Urruty and Lemaréchal (2012)] to $\left\|\left(\cdot\right)\right\|_{\mathcal{A}}$ . From this we can write $\left(\ref{eq:transformed hamiltonian with indicator cost}\right)$ in general as

[TABLE]

2.2 Time-Optimal Control to a Goal Set

Consider a goal set $\Omega\subset\mathbb{R}^{n}$ and a task to determine the control that drives the system into $\Omega$ in minimal time. We represent the set $\Omega$ as an implicit surface with cost function $J:\mathbb{R}^{n}\rightarrow\mathbb{R}$ such that

[TABLE]

where $\text{int}\,\Omega$ denotes the interior of $\Omega$ . Note that if the goal is a point in $\mathbb{R}^{n}$ , then we can represent this by choosing $\Omega$ as a ball with arbitrarily small radius. As noted in Kirchner et al. (2018a) we solve for the minimum time to reach the set $\Omega$ by constructing a newton iteration, starting from an initial guess, $t_{0}$ , with

[TABLE]

where $\varphi\left(x,t_{i}\right)$ is the solution to $\left(\ref{eq:generalized hopf formula}\right)$ at time $t_{i}$ . Notice the value function mush satisfy the HJ equation

[TABLE]

where $\nabla_{x}\varphi\left(x,t_{i}\right)$ is the argument of the minimizer in $\left(\ref{eq:generalized hopf formula}\right)$ , which we will denote as $p^{*}$ . No change of variable is needed for the Newton update and we have

[TABLE]

We iterate $\left(\ref{eq:Newton iterate}\right)$ until convergence at the optimal time to reach, which we denote as $t^{*}$ . The optimal control can be found directly from the necessary conditions of optimality established by Pontryagin’s principal [Pontryagin (2018)] by noting the optimal trajectory, denoted as $\gamma^{*}$ , must satisfy

[TABLE]

where $\lambda^{*}$ is the optimal costate trajectory and is given by

[TABLE]

This implies our optimal control is

[TABLE]

for all time $s\in\left[0,t^{*}\right]$ , provided the gradient exists.

Note that in the above formulation, we compute a viscosity solution to $\left(\ref{eq:Non-state dependent HJ}\right)$ without constructing a discrete grid and this method can provide a numerical solution that is efficient to compute even when the state space is high-dimensional. Additionally, no derivative approximations are needed with Hopf formula-based methods, and this eliminates the numeric dissipation introduced with the Lax–Friedrichs scheme that is necessary to maintain stability in grid-based methods.

3 Online Trajectory Optimization With Delay

We’ll be considering a RTOC framework for the rest of this work with a variable time horizon $s\in\left[0,t^{k}\right]$ , where each $t^{k}$ is the optimal time-to-go as computed by Section 2.2. This can easily be applied to MPC problems by using a fixed, finite time horizon $s\in\left[0,t\right]$ . The superscript $k\in\mathbb{N}$ is used to denote the computational update of each relevant quantity, with $k=0$ being the first optimization. With this notation, $x^{k}$ denotes our initial state when we initiate the computation, and likewise we denote by $\gamma^{*}\left(s;x^{k},\alpha^{k}\left(\cdot\right)\right)$ as the $k$ -th computed optimal trajectory. We re-compute the control online after $\delta^{k}$ seconds of traveling on the trajectory $\gamma^{*}\left(s;x^{k},\alpha^{k}\left(\cdot\right)\right)$ , which gives the new initial state for the next optimization as

[TABLE]

Using $x^{k+1}$ as the new initial condition, we proceed to use the methods of Section 2.2 to compute $\alpha^{k+1}\left(s\right)$ . Note that the time between updates, $\delta^{k}$ need not be uniform. Now consider the following linear state space model with time delay

[TABLE]

with delayed control input $u^{k}\in\mathbb{\mathcal{A}\subset R}^{m}$ , and $\tau^{k}$ is the step-specific time delay. We can represent the delay dynamics in the same form as $\left(\ref{eq:general dynamics}\right)$ by defining $\alpha^{k}\left(\cdot\right)$ , for $k\neq 0$ , as

[TABLE]

We only consider causal systems and therefore assume each $\tau^{k}\geq 0$ and

[TABLE]

The Hamiltonian becomes

[TABLE]

for the Hopf formula in $\left(\ref{eq:generalized hopf formula}\right),$ with $p^{k}$ denoting the optimal inital costate for the update. Likewise we solve for the optimal time-to-reach of the delayed system using the methods of Section $\left(\ref{subsec:Time-Optimal-Control-to}\right)$ . The Hamiltonian for the Newton iteration becomes

[TABLE]

This implies our control becomes

[TABLE]

4 An Example of Online Sensing

and Trajectory Optimization

We present as an example a robotic vehicle that uses a radio transmitter to communicate data to a remote base station. The goal is to plan a trajectory to deliver the robot to a location with the best possible communication performance, in minimum time. The location of base station is unknown a priori and hence the communication link performance needs to be spatially estimated. Initially, the robot will have only a sparse number of samples of the communication link, or perhaps none at all, and needs to determine the channel-to-noise ratio (CNR). As the robot moves, more samples are collected and the estimate is improved. Each time the estimate is improved, a new trajectory needs to be generated, since the original trajectory is no longer optimal under the previous estimate.

This is similar to the problem addressed in Usman et al. (2016), but that work attempted to minimize total power, was not time-optimal, and did not account for time delays. We choose this example to demonstrate the method presented since communication link performance needs to be sensed online and the computation time for the channel estimation is non-trival. In the case of Usman et al. (2016), 2 seconds of computation was required for every 10 second computation cycle.

4.1 Channel Estimation with Unknown Emitter Location

It was proposed in Malmirchegini and Mostofi (2012) to model the CNR (in dB) at a particular spatial location $q\in\mathbb{R}^{2}$ from $\ell$ observed measurements as

[TABLE]

where $\Gamma\left(q;q_{b}\right)$ is a parametric model of path loss, relating CNR to spatial distance to the (known) transmitter location, $q_{b}$ and is given as

[TABLE]

where $c_{PL}$ and $n_{PL}$ are constant parameters associated with the transmitter. The quantity $\Delta\left(q\right)$ represents the deviation of the true CNR from $\Gamma$ and is modeled nonparametrically as a Gaussian process [Rasmussen (2004)] with

[TABLE]

for each $q_{i}$ , where $k_{\Delta}$ is referred to as the covariance function and for modeling communication channels was suggested in Malmirchegini and Mostofi (2012) to be

[TABLE]

for any $i,j\in\left\{1,\ldots,\ell\right\}$ channel measurements. The idea of constructing a model as a composition of a known parametric model and a non-parametric deviation is not uncommon and is generally formulated as Gaussian process regression with explicit basis functions (Rasmussen, 2004, Chapter 2.7). We present the model while omitting a detailed derivation as this is outside the scope of this work. For more information, the reader is encouraged to review Rasmussen (2004) for a thorough and complete review of Gaussian process regression and Malmirchegini and Mostofi (2012) for the derivation of the proposed covariance functions for communication models.

The model $\left(\ref{eq:basic CNR model}\right)$ assumes a priori knowledge of the location of the transmitter, which for many robotic path planning problems may not be available at run-time. We propose to generalize $\left(\ref{eq:basic CNR model}\right)$ by considering the case where the transmitting location is an unknown random variable distributed with probability density function $g\left(q_{b}\right)$ and model $\Upsilon$ with a single Gaussian process with

[TABLE]

where

[TABLE]

with $k_{\Gamma}\left(q_{i},q_{j}\right)$ denoting the covariance function associated to the path loss of the signal. The last equation follows from the fact the the sum of two kernels, i.e. $\left(\ref{eq:basic CNR model}\right)$ , results in the sum of the respective covariance functions. To determine $k_{\Gamma}\left(q_{i},q_{j}\right)$ , we follow Neal (1996) to get

[TABLE]

Letting $K_{q}=\left[k\left(q,q_{1}\right),k\left(q,q_{2}\right),\ldots,k\left(q,q_{\ell}\right)\right]^{\top}$ and $K$ a matrix with entries defined by $K_{ij}=k\left(q_{i},q_{j}\right)$ , the posterior estimate of the channel at an unknown location $q$ can be found with mean

[TABLE]

with ${\bf y}=\left[y_{1},y_{2},\ldots,y_{\ell}\right]^{\top}$ a vector of $\ell$ measurements, and $I$ the identity matrix of the approriate size. The variance of the channel estimate is given by

[TABLE]

An example is shown in Figure 1, where a vehicle collects measurements and uses the above method to estimate the unkown communucation channel.

5 Results

The methods presented above was implemented in MATLAB R2018b on a laptop equipped with an Intel Core i7-7500 CPU running at 2.70 GHz. We used as values for the parameters for communication channels what was presented in Usman et al. (2016) with $\sigma_{\rho}=1.64$ , $\xi=3.20$ $c_{PL}=-41.34$ , $\eta=3.09$ , and $n_{PL}=3.86$ . While the time to compute could vary as more measurements are gathered, we force a fixed value of time delay for consistency. We also follow Usman et al. (2016) for these values with $\tau^{k}=2$ seconds and $\delta^{k}=10$ seconds for all $k$ . The newton update of $\left(\ref{eq:Newton iterate}\right)$ and $\left(\ref{eq:Hamiltonian for Newton and delay}\right)$ is stopped when $\varphi(x,t_{i})<=10^{-3}$ .

5.1 Planar Motion Example

We choose for dynamics $\left(\ref{eq:general linear system}\right)$ with state $x\in\left[q,\dot{q}\right]^{\top}$ , where $q\in\mathbb{R}^{2}$ is spatial position of a robot and $\dot{q}\in\mathbb{R}^{2}$ is the velocity and

[TABLE]

The control $u\in\mathbb{R}^{2}$ is constrained to lie in the set $\left\|u\right\|_{2}\leq 1$ . Since the 2-norm is self-dual, the Hamiltonian $\left(\ref{eq:Hamiltonian of 'z'}\right)$ for this example is

[TABLE]

and the Hamitonian $\left(\ref{eq:Hamiltonian for Newton and delay}\right)$ for the Newton update becomes

[TABLE]

with

[TABLE]

The control is found as

[TABLE]

The initial position of the robot is $q_{0}=\left[45,30\right]^{\top}$ with initial velocity $\dot{q}_{0}=\left[-10,0\right]^{\top}$ , and the transmitter is located at $q_{b}=\left[25,-25\right]^{\top}$ . We assume no prior channel measurements before moving, unlike in Usman et al. (2016), and use for the prior of transmitter location a uniform distribution of the operating area, with $q_{b}\sim\mathcal{U}\left(\left[-50,50\right]\times\left[-50,50\right]\right)$ . The vehicle collects an additional sample when the vehicle has displaced $\eta$ meters, the scale length of $k_{\Delta}$ kernel function. The vehicle guides to a goal set that is an ellipsoidal neighborhood around the peak of the estimated CNR by setting

[TABLE]

where $\tilde{x}^{k}=\left(\tilde{q}_{b}^{k},0,0\right)^{\top}$ and $\tilde{q}_{b}^{k}$ is the peak of the estimated CNR at the $k$ -ith iteration. The matrix $W$ is positive definite and defines the shape of the neighborhood. For this example, we use

[TABLE]

with $V_{\text{max}}$ being the maximal allowable velocity at the goal. We consider a vehicle that comes to close to rest at the goal, so set $V_{\text{max}}=0.1$ . This implies the follow initial value function

[TABLE]

Figure 2 shows the estimation and associated optimal trajectory for 3 optimization cycles. The vehicle starts with only a single sample of channel at the vehicle’s initial location. With the transmitter’s location unknown and assumed equally likely at any spot in the operating area, the estimate is biased towards the center of the area. The estimate and optimal trajectory to the peak in the estimated channel is shown in Fig. 2a. After traveling along this path for 10 seconds and acquiring more channel samples, a new estimate is shown in Fig 2b. The path was in an area of low received signal strength and the new estimate reduces the estimated channel in this region, as well as shifting the estimate of the peak closer to the ground-truth location. A time delay of 2 seconds was induced from the channel estimate and is compensated in the online update to the optimal trajectory, shown in green. Figure 2c shows another cycle of more measurements and an improved estimation of the channel with corresponding optimal path.

{ack}

The author would like to thank Arjun Muralidharan at the University of California, Santa Barbara (now a software engineer in the Network Infrastructure Group at Google, Inc.) for the many helpful and enlightening discussions on communication models used for robotic platforms.

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bellman (1957) Bellman, R.E. (1957). Dynamic Programming , volume 1. Princeton University Press.
2Bryson and Ho (1975) Bryson, A.R. and Ho, Y.C. (1975). Applied Optimal Control: Optimization, Estimation and Control . CRC Press.
3Camacho and Alba (2013) Camacho, E.F. and Alba, C.B. (2013). Model Predictive Control . Springer.
4Darbon and Osher (2016) Darbon, J. and Osher, S. (2016). Algorithms for overcoming the curse of dimensionality for certain Hamilton-Jacobi equations arising in control theory and elsewhere. Research in the Mathematical Sciences , 3(1), 19.
5Evans (2010) Evans, L.C. (2010). Partial Differential Equations . American Mathematical Society, Providence, R.I.
6Franklin et al. (2006) Franklin, G.F., Powell, J.D., and Emami-Naeini, A. (2006). Feedback Control of Dynamic Systems . Prentice Hall, 5 edition.
7Hiriart-Urruty and Lemaréchal (2012) Hiriart-Urruty, J.B. and Lemaréchal, C. (2012). Fundamentals of Convex Analysis . Springer Science & Business Media.
8Hopf (1965) Hopf, E. (1965). Generalized solutions of non-linear equations of first order. Journal of Mathematics and Mechanics , 14, 951–973.