Computing Probabilistic Controlled Invariant Sets

Yulong Gao; Karl H. Johansson; and Lihua Xie

arXiv:1905.04117·cs.SY·July 6, 2021

Computing Probabilistic Controlled Invariant Sets

Yulong Gao, Karl H. Johansson, and Lihua Xie

PDF

Open Access

TL;DR

This paper introduces probabilistic controlled invariant sets (PCISs) for stochastic control systems, providing algorithms for their computation in discrete and continuous spaces, with applications demonstrated through motion planning simulations.

Contribution

It proposes finite- and infinite-horizon PCISs, explores their relation to robust invariant sets, and develops computational algorithms for practical control system applications.

Findings

01

Algorithms for PCIS computation are computationally tractable.

02

Finite-horizon PCISs converge with space discretization.

03

Infinite-horizon PCISs relate to stochastic backward reachable sets.

Abstract

This paper investigates stochastic invariance for control systems through probabilistic controlled invariant sets (PCISs). As a natural complement to robust controlled invariant sets~(RCISs), we propose finite- and infinite-horizon PCISs, and explore their relation to RICSs. We design iterative algorithms to compute the PCIS within a given set. For systems with discrete spaces, the computations of the finite- and infinite-horizon PCISs at each iteration are based on linear programming and mixed integer linear programming, respectively. The algorithms are computationally tractable and terminate in a finite number of steps. For systems with continuous spaces, we show how to discretize the spaces and prove the convergence of the approximation when computing the finite-horizon PCISs. In addition, it is shown that an infinite-horizon PCIS can be computed by the stochastic backward reachable…

Tables1

Table 1. TABLE I: Comparisons between this paper and other work

	System	Invariant Set	Control	Horizon	Computation
This paper	Markov controlled process	PCIS	Yes	Finite and infinite horizons	Iteration based on stochastic backward reachable set
[15]	Nonlinear stochastic system	PCIS	Yes	Finite and infinite horizons	No
[16]	Linear stochastic system	PCIS	Yes	One step	Ellipsoidal approximation
[17]	Linear stochastic system	Probabilistic invariant set	No	Infinite horizon	Polyhedral approximation based on Chebyshev’s inequality

Equations152

μ_{k} : X \to U, \forall k \in N_{[0, N - 1]} .

μ_{k} : X \to U, \forall k \in N_{[0, N - 1]} .

p_{N, Q}^{μ} (x_{0}) = Pr {\forall k \in N_{[0, N]}, x_{k} \in Q} .

p_{N, Q}^{μ} (x_{0}) = Pr {\forall k \in N_{[0, N]}, x_{k} \in Q} .

V_{k, Q}^{*} (x) = u \in U sup \mathbbm 1_{Q} (x) \int_{Q} V_{k + 1, Q}^{*} (y) T (d y ∣ x, u), x \in X,

V_{k, Q}^{*} (x) = u \in U sup \mathbbm 1_{Q} (x) \int_{Q} V_{k + 1, Q}^{*} (y) T (d y ∣ x, u), x \in X,

U_{k} (x, λ) = {u \in U ∣ \int_{X} V_{k + 1, Q}^{*} (y) T (d y ∣ x, u) \geq λ}

U_{k} (x, λ) = {u \in U ∣ \int_{X} V_{k + 1, Q}^{*} (y) T (d y ∣ x, u) \geq λ}

μ_{k, Q}^{*} (x) = ar g u \in U sup \mathbbm 1_{Q} (x) \int_{Q} V_{k + 1, Q}^{*} (y) T (d y ∣ x, u),

μ_{k, Q}^{*} (x) = ar g u \in U sup \mathbbm 1_{Q} (x) \int_{Q} V_{k + 1, Q}^{*} (y) T (d y ∣ x, u),

x \in Q, k \in N_{[0, N - 1]} .

p_{\infty, Q}^{μ} (x_{0}) = Pr {\forall k \in N, x_{k} \in Q} .

p_{\infty, Q}^{μ} (x_{0}) = Pr {\forall k \in N, x_{k} \in Q} .

G_{k + 1, Q}^{*} (x) = u \in U sup \mathbbm 1_{Q} (x) \int_{Q} G_{k, Q}^{*} (y) T (d y ∣ x, u), x \in X,

G_{k + 1, Q}^{*} (x) = u \in U sup \mathbbm 1_{Q} (x) \int_{Q} G_{k, Q}^{*} (y) T (d y ∣ x, u), x \in X,

U_{k} (x, λ) = {u \in U ∣ \int_{X} G_{k, Q}^{*} (y) T (d y ∣ x, u) \geq λ}

U_{k} (x, λ) = {u \in U ∣ \int_{X} G_{k, Q}^{*} (y) T (d y ∣ x, u) \geq λ}

G_{\infty, Q}^{*} (x) = u \in U sup \mathbbm 1_{Q} (x) \int_{Q} G_{\infty, Q}^{*} (y) T (d y ∣ x, u)),

G_{\infty, Q}^{*} (x) = u \in U sup \mathbbm 1_{Q} (x) \int_{Q} G_{\infty, Q}^{*} (y) T (d y ∣ x, u)),

\overset{μ}{ˉ}_{Q}^{*} (x) = ar g u \in U sup \mathbbm 1_{Q} (x) \int_{Q} G_{\infty, Q}^{*} (y) T (d y ∣ x, u), x \in Q .

\overset{μ}{ˉ}_{Q}^{*} (x) = ar g u \in U sup \mathbbm 1_{Q} (x) \int_{Q} G_{\infty, Q}^{*} (y) T (d y ∣ x, u), x \in Q .

S_{ϵ, N}^{*} (Q) = {x \in Q ∣ \exists μ \in M, p_{N, Q}^{μ} (x) \geq ϵ}

S_{ϵ, N}^{*} (Q) = {x \in Q ∣ \exists μ \in M, p_{N, Q}^{μ} (x) \geq ϵ}

= {x \in Q ∣ μ \in M sup p_{N, Q}^{μ} (x) \geq ϵ}

= {x \in Q ∣ V_{0, Q}^{*} (x) \geq ϵ} .

i \to \infty lim inf P_{i} = i \geq 1 ⋃ j \geq i ⋂ P_{j} = j \geq 1 ⋂ P_{j} = i \geq 1 ⋂ j \geq i ⋃ P_{j} = i \to \infty lim sup P_{i},

i \to \infty lim inf P_{i} = i \geq 1 ⋃ j \geq i ⋂ P_{j} = j \geq 1 ⋂ P_{j} = i \geq 1 ⋂ j \geq i ⋃ P_{j} = i \to \infty lim sup P_{i},

min k = 0 \sum N x \in P_{i} \sum v_{k} (x)

min k = 0 \sum N x \in P_{i} \sum v_{k} (x)

subject to \forall x \in P_{i}

v_{k} (x) \geq y \in P_{i} \sum v_{k + 1} (y) T (y ∣ x, u), \forall u \in U_{x}, \forall k \in N_{[0, N - 1]},

v_{N} (x) \geq 1,

v_{k}^{*} (x) = y \in P_{i} \sum v_{k + 1}^{*} (y) T (y ∣ x, u) .

v_{k}^{*} (x) = y \in P_{i} \sum v_{k + 1}^{*} (y) T (y ∣ x, u) .

∣ t (y ∣ x, u) - t (y^{'} ∣ x^{'}, u^{'}) ∣ \leq L (∥ y - y^{'} ∥ + ∥ x - x^{'} ∥ + ∥ u - u^{'} ∥) .

∣ t (y ∣ x, u) - t (y^{'} ∣ x^{'}, u^{'}) ∣ \leq L (∥ y - y^{'} ∥ + ∥ x - x^{'} ∥ + ∥ u - u^{'} ∥) .

\hat{U}_{x} = {\overset{u}{^} \in \hat{U} ∣ ∥ u - \overset{u}{^} ∥ \leq δ for some u \in U_{s_{x}}},

\hat{U}_{x} = {\overset{u}{^} \in \hat{U} ∣ ∥ u - \overset{u}{^} ∥ \leq δ for some u \in U_{s_{x}}},

\hat{t} (y ∣ x, \overset{u}{^}) = {\frac{t ( s _{y} ∣ s _{x} , u ^ )}{\int _{Q} t ( s _{z} ∣ s _{x} , u ^ ) d z}, t (s_{y} ∣ s_{x}, \overset{u}{^}), \mbox i f \int_{Q} t (s_{z} ∣ s_{x}, \overset{u}{^}) d z \geq 1, \mbox o t h er w i se .

\hat{t} (y ∣ x, \overset{u}{^}) = {\frac{t ( s _{y} ∣ s _{x} , u ^ )}{\int _{Q} t ( s _{z} ∣ s _{x} , u ^ ) d z}, t (s_{y} ∣ s_{x}, \overset{u}{^}), \mbox i f \int_{Q} t (s_{z} ∣ s_{x}, \overset{u}{^}) d z \geq 1, \mbox o t h er w i se .

\displaystyle\begin{cases}\hat{V}^{*}_{N,\mathbb{Q}}(q_{i})=1,\\ \hat{V}^{*}_{k,\mathbb{Q}}(q_{i})=\max\limits_{\hat{u}\in\hat{\mathbb{U}}}\big{(}\sum\limits_{j=1}^{m_{x}}\hat{V}^{*}_{k+1,\mathbb{Q}}(q_{j})\hat{T}(q_{j}|q_{i},\hat{u})\big{)},\forall k\in\mathbb{N}_{[0,N-1]}.\end{cases}

\displaystyle\begin{cases}\hat{V}^{*}_{N,\mathbb{Q}}(q_{i})=1,\\ \hat{V}^{*}_{k,\mathbb{Q}}(q_{i})=\max\limits_{\hat{u}\in\hat{\mathbb{U}}}\big{(}\sum\limits_{j=1}^{m_{x}}\hat{V}^{*}_{k+1,\mathbb{Q}}(q_{j})\hat{T}(q_{j}|q_{i},\hat{u})\big{)},\forall k\in\mathbb{N}_{[0,N-1]}.\end{cases}

\overset{μ}{^}_{k, Q}^{*} (q_{i}) = ar g \overset{u}{^} \in \hat{U} max \int_{Q} \hat{V}_{k + 1, Q}^{*} (y) \hat{t} (y ∣ q_{i}, \overset{u}{^}) d y,

\overset{μ}{^}_{k, Q}^{*} (q_{i}) = ar g \overset{u}{^} \in \hat{U} max \int_{Q} \hat{V}_{k + 1, Q}^{*} (y) \hat{t} (y ∣ q_{i}, \overset{u}{^}) d y,

\displaystyle\hskip 28.45274pt=\arg\max\limits_{\hat{u}\in\hat{\mathbb{U}}}\big{(}\sum\limits_{j=1}^{m_{x}}\hat{V}^{*}_{k+1,\mathbb{Q}}(q_{j})\hat{T}(q_{j}|q_{i},\hat{u})\big{)}.

∣ V_{k, Q}^{*} (x) - \hat{V}_{k, Q}^{*} (x) ∣ \leq τ_{k} (Q) δ,

∣ V_{k, Q}^{*} (x) - \hat{V}_{k, Q}^{*} (x) ∣ \leq τ_{k} (Q) δ,

{τ_{N} (Q) = 0, τ_{k} (Q) = 4 ϕ (Q) L + τ_{k + 1} (Q), \forall k \in N_{[0, N - 1]} .

{τ_{N} (Q) = 0, τ_{k} (Q) = 4 ϕ (Q) L + τ_{k + 1} (Q), \forall k \in N_{[0, N - 1]} .

V_{0, Q}^{*} (x) \geq \hat{V}_{0, Q}^{*} (x) - τ_{0} (Q) δ \geq \overset{ϵ}{^} - τ_{0} (Q) δ, \forall x \in Q .

V_{0, Q}^{*} (x) \geq \hat{V}_{0, Q}^{*} (x) - τ_{0} (Q) δ \geq \overset{ϵ}{^} - τ_{0} (Q) δ, \forall x \in Q .

S_{ϵ, \infty}^{*} (Q) = {x \in Q ∣ \exists μ \in M, p_{\infty, Q}^{μ} (x) \geq ϵ}

S_{ϵ, \infty}^{*} (Q) = {x \in Q ∣ \exists μ \in M, p_{\infty, Q}^{μ} (x) \geq ϵ}

= {x \in Q ∣ μ \in M sup p_{\infty, Q}^{μ} (x) \geq ϵ}

= {x \in Q ∣ G_{\infty, Q}^{*} (x) \geq ϵ} .

T (Q_{f} ∣ x, u_{x}) + \int_{Q ∖ Q_{f}} T (Q_{f} ∣ y, u_{y}) T (d y ∣ x, u_{x})

T (Q_{f} ∣ x, u_{x}) + \int_{Q ∖ Q_{f}} T (Q_{f} ∣ y, u_{y}) T (d y ∣ x, u_{x})

+ \frac{ρ ^{2}}{1 - ρ} \geq ϵ,

T (Q_{f} ∣ x, u_{x}) + \int_{Q ∖ Q_{f}} T (Q_{f} ∣ y, u_{y}) T (d y ∣ x, u_{x}) \geq ϵ .

T (Q_{f} ∣ x, u_{x}) + \int_{Q ∖ Q_{f}} T (Q_{f} ∣ y, u_{y}) T (d y ∣ x, u_{x}) \geq ϵ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Control Systems and Identification · Fault Detection and Control Systems

Full text

Computing Probabilistic Controlled Invariant Sets

Yulong Gao, Karl H. Johansson, Fellow, IEEE and Lihua Xie, Fellow, IEEE This work of Y. Gao and K. H. Johansson is supported by the Knut and Alice Wallenberg Foundation, the Swedish Strategic Research Foundation, and the Swedish Research Council.Y. Gao and K. H. Johansson are with Division of Decision and Control Systems, KTH Royal Institute of Technology, Stockholm 10044, Sweden [email protected], [email protected]. Gao and L. Xie are with School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore [email protected], [email protected]

Abstract

This paper investigates stochastic invariance for control systems through probabilistic controlled invariant sets (PCISs). As a natural complement to robust controlled invariant sets (RCISs), we propose finite- and infinite-horizon PCISs, and explore their relation to RICSs. We design iterative algorithms to compute the PCIS within a given set. For systems with discrete spaces, the computations of the finite- and infinite-horizon PCISs at each iteration are based on linear programming and mixed integer linear programming, respectively. The algorithms are computationally tractable and terminate in a finite number of steps. For systems with continuous spaces, we show how to discretize the spaces and prove the convergence of the approximation when computing the finite-horizon PCISs. In addition, it is shown that an infinite-horizon PCIS can be computed by the stochastic backward reachable set from the RCIS contained in it. These PCIS algorithms are applicable to practical control systems. Simulations are given to illustrate the effectiveness of the theoretical results for motion planning.

Index Terms:

stochastic control systems, reachability analysis, probabilistic controlled invariant set (PCIS)

I Introduction

I-A Motivation and Related Work

Invariance is a fundamental concept in systems and control [1, 2, 3]. A controlled invariant set captures the region where the states can be maintained by some admissible control inputs. Robust controlled invariant sets (RCISs) are defined for control systems with bounded external disturbances and address the invariance despite any realization of the disturbances. In the past decades, there have been lots of research results on RCISs and their computations [4, 5, 6]. This paper studies probabilistic controlled invariant sets (PCISs), which is a natural complement to RCISs suitable in many applications. A PCIS is a set within which the controller is able to keep the system state with a certain probability. Such sets not only alleviate the inherent conservatism of RCISs by allowing probabilistic violations but also enlarge the applications of RCISs by being able to address unbounded disturbances. The study of PCISs is motivated by safety-critical control [7], stochastic model predictive control (MPC) [8, 9], reliable control [10, 11], and relevant applications, e.g., air traffic management systems [12, 13] and motion planning [14].

A question at the heart of this paper is

Given a set $\mathbb{Q}$ and a parameter $0\leq\epsilon\leq 1$ , how to compute a set $\tilde{\mathbb{Q}}\subseteq\mathbb{Q}$ that is invariant with probability $\epsilon$ ?

To the best of our knowledge, this question has not been explored up to now. One essential component in iterative approaches on computing RCISs is to compute the robust backward reachable set, in which each state can be steered to the current set by an admissible input for all possible uncertainties [4, 5, 6]. The PCIS computation in this paper follows the same idea, but the robust backward reachable set is replaced with the stochastic backward reachable sets which require different mathematical tools. Some challenges related to such an approach should be highlighted: (i) how to make it tractable to compute the stochastic backward reachable set, in particular for systems with continuous spaces; (ii) how to mitigate the conservatism when characterizing the stochastic backward reachable set subject to the prescribed probability; (iii) how to guarantee convergence of the iterations.

Controlled invariant sets have recently been extended to stochastic systems. In [18], a target set, which is similar to the PCIS of this paper, is used to define stabilization in probability. In [10], a reliable control set, another similar notion to a PCIS, is used to guarantee the reliability of Markov-jump linear systems. The reliability is further studied for such systems with bounded disturbances in [11]. A definition of PCIS for nonlinear systems is provided in [15] by using reachability analysis. It is later applied to portfolio optimization [19]. Another definition of probabilistic invariance originates from stochastic MPC [16] and captures one-step invariance. In [16], an ellipsoidal approximation is given for linear systems with specific uncertainty structure. Similar invariant sets are used in [20] to construct a convex lifting function for linear stochastic control systems. A definition of a probabilistic invariant set is proposed in [17, 21] for linear stochastic systems without control inputs. This definition captures the probabilistic inclusion of the state at each time instant. A recent work [22] explores the correspondence between probabilistic and robust invariant sets for linear systems. In [17, 21], polyhedral probabilistic invariant sets are approximated by using Chebyshev’s inequality for linear systems with Gaussian noise. Recursive satisfaction is usually computationally intractable for general stochastic control systems.

The results of this paper build on the above work but make significant additions and improvements. Table I summarizes the comparison between our work and the most relevant literature. (i) All the above references focus on some specific stochastic systems (e.g., linear or one-dimensional affine nonlinear systems) or on some specific class of stochastic disturbances (e.g., Gaussian or state-independent noise). In our model, we consider general Markov controlled processes, which include general system dynamics and stochastic disturbances. (ii) Different from [17, 21], our invariant sets are defined based on trajectory inclusion as in [15] and, particularly, incorporate control inputs constrained by a compact set. An accompanying question is how to find an admissible control input when verifying or computing a PCIS. (iii) The PCISs in this paper are different from the maximal probabilistic safe sets in [23]. Every trajectory in a PCIS is required by our definition to admit the same probability level, which does not hold for the maximal probabilistic safe set. (vi) The stochastic reachability analysis studied in [23] provides an important tool for maximizing the probability of staying in a set. Based on this, we compute a PCIS within a set with a prescribed probability level. This extends the results of [15, 23, 24].

I-B Main Contributions and Organization

The objective of this paper is to provide a novel tool to analyze invariance in stochastic control systems. The contributions are summarized as follows.

As the first contribution, we propose two novel definitions of PCIS: $N$ -step $\epsilon$ -PCIS and infinite-horizon $\epsilon$ -PCIS (Definitions 3 and 4). An $N$ -step $\epsilon$ -PCIS is a set within which the state can stay for $N$ steps with probability $\epsilon$ under some admissible controller while an infinite-horizon $\epsilon$ -PCIS is a set within which the state can stay forever with probability $\epsilon$ under some admissible controller. These invariant sets are different from the ones proposed in [16, 17], which address probabilistic set invariance at each time step. Our definitions are applicable for general discrete-time stochastic control systems. We provide fundamental properties of PCISs and explore their relation to RCISs. Furthermore, we propose conditions for the existence of infinite-horizon $\epsilon$ -PCIS (Theorem 3).

The second contribution is that we design iterative algorithms to compute the largest finite- and infinite-horizon PCIS within a given set for systems with discrete and continuous spaces. The PCIS computation is based on the stochastic backward reachable set. For discrete state and control spaces, it is shown that at each iteration, the stochastic backward reachable set computation of an $N$ -step $\epsilon$ -PCIS can be reformulated as a linear program (LP) (Theorem 1 and Corollary 1) and an infinite-horizon $\epsilon$ -PCIS as a computationally tractable mixed-integer linear program (MILP) (Theorem 4). Furthermore, we prove that these algorithms terminate in a finite number of steps. For continuous state and control spaces, we present a discretization procedure. Under weaker assumptions than [25], we prove the convergence of such approximations for $N$ -step $\epsilon$ -PCISs (Theorem 2). The approximations generalize the case in [23], which only discretizes the state space for a given discrete control space. Furthermore, in order to compute an infinite-horizon $\epsilon$ -PCIS, we propose an algorithm based on that an infinite-horizon PCIS always contains an RCIS.

The remainder of the paper is organized as follows. Section II provides the system model and some preliminaries. Section III presents the definition, properties, and computation algorithms of finite-horizon PCISs. Section IV extends the results to the infinite-horizon case. Examples in Section V illustrate the effectiveness of our approach. Section VI concludes this paper.

Notation. Let $\mathbb{N}$ denote the set of nonnegative integers and $\mathbb{R}$ the set of real numbers. For some $q,s\in\mathbb{N}$ and $q<s$ , let $\mathbb{N}_{\geq q}$ and $\mathbb{N}_{[q,s]}$ denote the sets $\{r\in\mathbb{N}\mid r\geq q\}$ and $\{r\in\mathbb{N}\mid q\leq r\leq s\}$ , respectively. For two sets $\mathbb{X}$ and $\mathbb{Y}$ , $\mathbb{X}\setminus\mathbb{Y}=\{x\mid x\in\mathbb{X},x\notin\mathbb{Y}\}$ and $\mathbb{X}\bigtriangleup\mathbb{Y}=(\mathbb{X}\setminus\mathbb{Y})\cup(\mathbb{Y}\setminus\mathbb{X})$ . When $\leq$ , $\geq$ , $<$ , and $>$ are applied to vectors, they are interpreted element-wise. $\rm{Pr}$ denotes the probability. For a set $\mathbb{X}$ , $\mathcal{B}(\mathbb{X})$ and $\mathcal{P}(\mathbb{X})$ denote the Boreal $\sigma$ -algebra generated by $\mathbb{X}$ and the space of probability distributions on $\mathbb{X}$ , respectively. The indicator function of a set $\mathbb{X}$ is denoted by $\mathbbm{1}_{\mathbb{X}}(x)$ , that is, if $x\in\mathbb{X}$ , $\mathbbm{1}_{\mathbb{X}}(x)=1$ and otherwise, $\mathbbm{1}_{\mathbb{X}}(x)=0$ .

II System Description and Preliminaries

Consider a stochastic control system described by a Markov controlled process $\mathcal{S}=(\mathbb{X},\mathbb{U},T)$ , where

•

$\mathbb{X}$ is a state space endowed with a Borel $\sigma$ -algebra $\mathcal{B}(\mathbb{X})$ ;

•

$\mathbb{U}$ is a compact control space endowed with a Borel $\sigma$ -algebra $\mathcal{B}(\mathbb{U})$ ;

•

$T:\mathcal{B}(\mathbb{X})\times\mathbb{X}\times\mathbb{U}\rightarrow\mathbb{R}$ is a Borel-measurable stochastic kernel given $\mathbb{X}\times\mathbb{U}$ , which assigns to each $x\in\mathbb{X}$ and $u\in\mathbb{U}$ a probability measure on the Borel space $(\mathbb{X},\mathcal{B}(\mathbb{X}))$ : $T(\cdot|x,u)$ .

Let us denote by $\mathbb{U}_{x}$ the set of the admissible control actions for each $x\in\mathbb{X}$ . Assume that $\mathbb{U}_{x}$ is nonempty for each $x\in\mathbb{X}$ .

Consider a finite horizon $N\in\mathbb{N}$ . A policy is said to be a Markov policy if the control inputs are only dependent on the current state, i.e., $u_{k}=\mu_{k}(x_{k})$ .

Definition 1

(Markov Policy) A Markov policy $\bm{\mu}$ for system $\mathcal{S}$ is a sequence $\bm{\mu}=(\mu_{0},\mu_{1},\ldots,\mu_{N-1})$ of universally measurable maps

[TABLE]

Remark 1

Given a space $\mathbb{Y}$ , a subset $\mathbb{A}$ in this space is universally measurable if it is measurable with respect to every complete probability measure on $\mathbb{Y}$ that measures all Borel sets in $\mathcal{B}(\mathbb{Y})$ . A function $\mu:\mathbb{Y}\rightarrow\mathbb{W}$ is universally measurable if $\mu^{-1}(\mathbb{A})$ is universally measurable in $\mathbb{Y}$ for every $\mathbb{A}\in\mathcal{B}(\mathbb{W})$ . As stated in [23, 26], the condition of universal measurability is weaker than the condition of Borel measurability for showing the existence of a solution to a stochastic optimal problem. Roughly speaking, this is because the projections of measurable sets are analytic sets and analytic sets are universally measurable but not always Borel measurable [26, 27].

Remark 2

For a large class of stochastic optimal control problems, Markov policies are sufficient to characterize the optimal policy [26]. Furthermore, since a randomized Markov policy does not increase the largest probability that the states remain in a set, we focus on deterministic Markov policies in the following.

We denote the set of Markov policies as $\mathcal{M}$ . Consider a set $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ . Given an initial state $x_{0}\in\mathbb{X}$ and a Markov policy $\bm{\mu}\in\mathcal{M}$ , an execution is a sequence of states $(x_{0},x_{1},\ldots,x_{N})$ . Introduce the probability with which the state $x_{k}$ will remain within $\mathbb{Q}$ for all $k\in\mathbb{N}_{[0,N]}$ :

[TABLE]

Let $p^{*}_{N,\mathbb{Q}}(x)=\sup_{\bm{\mu}\in\mathcal{M}}p_{N,\mathbb{Q}}^{\bm{\mu}}(x)$ , $\forall x\in\mathbb{Q}$ . We call $p^{*}_{N,\mathbb{Q}}(x)$ the $N$ -step invariance probability at $x$ in the set $\mathbb{Q}$ . Following the dynamic program (DP) in [23], define the value function $V^{*}_{k,\mathbb{Q}}:\mathbb{X}\rightarrow[0,1],k=0,1,\ldots,N$ , by the backward recursion:

[TABLE]

with initialization $V_{N,\mathbb{Q}}^{*}(x)=1,x\in\mathbb{Q}$ .

Assumption 1

The set

[TABLE]

is compact for all $x\in\mathbb{Q}$ , $\lambda\in\mathbb{R}$ , and $k\in\mathbb{N}_{[0,N-1]}$ .

Lemma 1

[23]** For all $x\in\mathbb{Q}$ , $p^{*}_{N,\mathbb{Q}}(x)=V_{0,\mathbb{Q}}^{*}(x)$ . If Assumption 1 holds, the optimal Markov policy $\bm{\mu}_{\mathbb{Q}}^{*}=(\mu_{0,\mathbb{Q}}^{*},\mu^{*}_{1,\mathbb{Q}},\ldots,\mu^{*}_{N-1,\mathbb{Q}})$ exists and is given by

[TABLE]

Extending the finite horizon to infinite horizon, we need to introduce stationary Markov policies.

Definition 2

(Stationary Markov Policy) A Markov policy $\bm{\mu}\in\mathcal{M}$ is said to be stationary if $\bm{\mu}=(\bar{\mu},\bar{\mu},\ldots)$ with $\bar{\mu}:\mathbb{X}\rightarrow\mathbb{U}$ universally measurable.

Given an initial state $x_{0}\in\mathbb{X}$ and a stationary Markov policy $\bm{\mu}\in\mathcal{M}$ , an execution is denoted by a sequence of states $(x_{0},x_{1},\ldots)$ . We introduce the probability with which the state $x_{k}$ will remain within $\mathbb{Q}$ for all $k\in\mathbb{N}_{\geq 0}$ :

[TABLE]

Denote $p^{*}_{\infty,\mathbb{Q}}(x_{0})=\sup_{\bm{\mu}\in\mathcal{M}}p_{\infty,\mathbb{Q}}^{\bm{\mu}}(x_{0})$ . We call $p^{*}_{\infty,\mathbb{Q}}(x)$ the infinite-horizon invariance probability at $x$ in the set $\mathbb{Q}$ . Define the value function $G^{*}_{k,\mathbb{Q}}:\mathbb{X}\rightarrow[0,1],k\in\mathbb{N}_{\geq 0}$ , through the forward recursion:

[TABLE]

initialized with $G^{*}_{0,\mathbb{Q}}(x)=1,x\in\mathbb{Q}$ .

Assumption 2

There exists a $\bar{k}\geq 0$ such that the set

[TABLE]

is compact for all $x\in\mathbb{Q}$ , $\lambda\in\mathbb{R}$ , and $k\in\mathbb{N}_{\geq\bar{k}}$ .

Lemma 2

[23]** Suppose that Assumption 2 holds. Then, for all $x\in\mathbb{Q}$ , the limit $G^{*}_{\infty,\mathbb{Q}}(x)$ exists and satisfies

[TABLE]

and $p^{*}_{\infty,\mathbb{Q}}(x)=G^{*}_{\infty,\mathbb{Q}}(x)$ . Furthermore, an optimal stationary Markov policy $\bm{\mu}_{\mathbb{Q}}^{*}=(\bar{\mu}_{\mathbb{Q}}^{*},\bar{\mu}_{\mathbb{Q}}^{*},\ldots)$ exists and is given by

[TABLE]

In the following two sections, we explore finite- and infinite-horizon PCISs and how to compute them.

III Finite-Horizon $\epsilon$ -PCIS

In this section, we first define finite-horizon $\epsilon$ -PCIS for the system $\mathcal{S}$ and provide the properties of this set. Then, we explore how to compute the finite-horizon $\epsilon$ -PCIS within a given set.

Definition 3

( $N$ -step $\epsilon$ -PCIS) Consider a stochastic control system $\mathcal{S}=(\mathbb{X},\mathbb{U},T)$ . Given a confidence level $0\leq\epsilon\leq 1$ , a set $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ is an $N$ -step $\epsilon$ -PCIS for $\mathcal{S}$ if for any $x\in\mathbb{Q}$ , there exists at least one Markov policy $\bm{\mu}\in\mathcal{M}$ such that $p_{N,\mathbb{Q}}^{\bm{\mu}}(x)\geq\epsilon$ .

We define the stochastic backward reachable set $\mathbb{S}^{*}_{\epsilon,N}(\mathbb{Q})$ by collecting all the states $x\in\mathbb{Q}$ at which the $N$ -step invariance probability $p^{*}_{N,\mathbb{Q}}(x)\geq\epsilon$ , i.e.,

[TABLE]

If $\mathbb{S}^{*}_{\epsilon,N}(\mathbb{Q})=\mathbb{Q}$ , it yields from $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ that $\mathbb{S}^{*}_{\epsilon,N}(\mathbb{Q})$ is also Borel-measurable. If $\mathbb{S}^{*}_{\epsilon,N}(\mathbb{Q})\subset\mathbb{Q}$ , the following lemma addresses the measurability of the set $\mathbb{S}^{*}_{\epsilon,N}(\mathbb{Q})$ .

Lemma 3

For any $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ , the set $\mathbb{S}^{*}_{\epsilon,N}(\mathbb{Q})\subseteq\mathbb{Q}$ is universally measurable.

Proof:

See Appendix A. ∎

Let us denote by $\mathcal{P}(\mathbb{X})$ the set of all probability measures on $\mathbb{X}$ . The following proposition shows that despite of the universal measurability of $\mathbb{S}^{*}_{\epsilon,N}(\mathbb{Q})$ , for any probability measure on $\mathbb{X}$ , one can find another Borel-measurable set $\tilde{\mathbb{S}}^{*}_{\epsilon,N}(\mathbb{Q}))$ for which the difference to $\mathbb{S}^{*}_{\epsilon,N}(\mathbb{Q})$ is measure-zero.

Proposition 1

For any $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ and any $p\in\mathcal{P}(\mathbb{X})$ , there exists a set $\tilde{\mathbb{S}}^{*}_{\epsilon,N}(\mathbb{Q})\in\mathcal{B}(\mathbb{X})$ with $\tilde{\mathbb{S}}^{*}_{\epsilon,N}(\mathbb{Q})\subseteq\mathbb{Q}$ such that $p(\tilde{\mathbb{S}}^{*}_{\epsilon,N}(\mathbb{Q})\bigtriangleup\mathbb{S}^{*}_{\epsilon,N}(\mathbb{Q}))=0$ .

Proof:

It follows from the universal measurability of $\mathbb{S}^{*}_{\epsilon,N}(\mathbb{Q})$ as shown in Lemma 3, the Borel measurability of $\mathbb{Q}$ , $\mathbb{S}^{*}_{\epsilon,N}(\mathbb{Q})\subseteq\mathbb{Q}$ , and Lemma 7.26 in [26]. ∎

From Lemma 1 and the definition of $\mathbb{S}^{*}_{\epsilon,N}(\mathbb{Q})$ , we can verify whether a set $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ is an $N$ -step $\epsilon$ -PCIS or not by checking if either $\mathbb{S}^{*}_{\epsilon,N}(\mathbb{Q})=\mathbb{Q}$ , or $V_{0,\mathbb{Q}}^{*}(x)\geq\epsilon$ , $\forall x\in\mathbb{Q}$ , where $V_{0,\mathbb{Q}}^{*}(x)$ is defined in (1).

Remark 3

The stochastic backward reachable set $\mathbb{S}^{*}_{\epsilon,N}(\mathbb{Q})$ is called the maximal probabilistic safe set in [23]. The $N$ -step $\epsilon$ -PCIS $\mathbb{Q}$ in Definition 3 refines the maximal probabilistic safe set by requiring that for any initial state $x_{0}\in\mathbb{Q}$ , the $N$ -step invariance probability $p_{\infty,\mathbb{Q}}^{*}(x_{0})$ is no less than $\epsilon$ .

In the following, we show that finite-horizon PCISs are closed under union.

Proposition 2

Consider a collection of sets $\mathbb{Q}_{i}\in\mathcal{B}(\mathbb{X})$ , $i=1,\ldots,r$ . If each $\mathbb{Q}_{i}$ is an $N_{i}$ -step $\epsilon_{i}$ -PCIS for the same system $\mathbb{S}$ , then the union $\bigcup_{i=1}^{r}\mathbb{Q}_{i}$ is an $N$ -step $\epsilon$ -PCIS, where $N=\min_{i}N_{i}$ and $\epsilon=\min_{i}\epsilon_{i}$ .

Proof:

The result follows from the following two facts:

(i) for any $\mathbb{Q},\mathbb{P}\in\mathcal{B}(\mathbb{X})$ with $\mathbb{Q}\subseteq\mathbb{P}$ , $\sup_{\bm{\mu}\in\mathcal{M}}p_{N,\mathbb{Q}}^{\bm{\mu}}(x)\leq\sup_{\bm{\mu}\in\mathcal{M}}p_{N,\mathbb{P}}^{\bm{\mu}}(x)$ , $\forall N\in\mathbb{N}$ and $\forall x\in\mathbb{Q}$ ;

(ii) for any $N,N^{\prime}\in\mathbb{N}$ with $N\leq N^{\prime}$ , $\sup_{\bm{\mu}\in\mathcal{M}}p_{N^{\prime},\mathbb{Q}}^{\bm{\mu}}(x)\leq\sup_{\bm{\mu}\in\mathcal{M}}p_{N,\mathbb{Q}}^{\bm{\mu}}(x)$ , $\forall Q\in\mathcal{B}(\mathbb{X})$ and $\forall x\in\mathbb{Q}$ . ∎

III-A Finite-horizon $\epsilon$ -PCIS computation

This subsection will address the following problem.

Problem 1

Given a set $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ and a prescribed probability $0\leq\epsilon\leq 1$ , compute an $N$ -step $\epsilon$ -PCIS $\tilde{\mathbb{Q}}\subseteq\mathbb{Q}$ .

To handle this problem, our basic idea is to iteratively compute stochastic backward reachable sets until convergence. A general procedure is presented in the following algorithm.

In Algorithm 1, we first compute the stochastic backward reachable set $\mathbb{S}^{*}_{\epsilon,N}(\mathbb{P}_{i})$ within $\mathbb{P}_{i}$ and then update $\mathbb{P}_{i+1}$ to be the corresponding Borel-measurable set $\tilde{\mathbb{S}}^{*}_{\epsilon,N}(\mathbb{P}_{i})$ , which is tailored by picking up a $p\in\mathcal{P}(\mathbb{S})$ such that $p(\tilde{\mathbb{S}}^{*}_{\epsilon,N}(\mathbb{P}_{i})\bigtriangleup\mathbb{S}^{*}_{\epsilon,N}(\mathbb{P}_{i}))=0$ (see Proposition 1). The following theorem shows convergence of $\mathbb{P}_{i}$ . The terminal condition guarantees that the resulting set by this algorithm is an $N$ -step $\epsilon$ -PCIS $\tilde{\mathbb{Q}}\subseteq\mathbb{Q}$ .

Theorem 1

Let Assumption 1 hold. For any $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ , Algorithm $1$ converges, i.e., $\lim_{i\rightarrow\infty}\mathbb{P}_{i}$ exists. If $\lim_{i\rightarrow\infty}\mathbb{P}_{i}\neq\emptyset$ , it is the largest $N$ -step $\epsilon$ -PCIS within $\mathbb{Q}$ .

Proof:

From Algorithm $1$ and Lemma 1, we have that if the termination condition does not hold, $\mathbb{P}_{i+1}\subset\mathbb{P}_{i}$ . It follows that the sequence $\{\mathbb{P}_{i}\}_{i\in\mathbb{N}}$ is nonincreasing. Then,

[TABLE]

which suggests the existence of $\lim_{i\rightarrow\infty}\mathbb{P}_{i}$ . Furthermore, if $\lim_{i\rightarrow\infty}\mathbb{P}_{i}$ is nonempty, we conclude that it is the largest $N$ -step PCIS within $\mathbb{Q}$ based on the fixed-point theory. ∎

To facilitate the practical implementation of Algorithm 1, we need to address two important properties: the computational tractability of $V^{*}_{0,\mathbb{P}_{i}}(x)$ , $\forall x\in\mathbb{P}_{i}$ , and the finite-step convergence of Algorithm 1. In the following, we will derive these two properties for discrete and continuous spaces, respectively. It is shown that if the spaces are discrete, the properties are guaranteed and in particular at each iteration we only need to solve an LP to compute the exact value of $V^{*}_{0,\mathbb{P}_{i}}$ . If the spaces are continuous, we will design a discretization algorithm with convergence guarantee, which enables us to preserve the above two properties.

III-A1 Discrete state and control spaces

If the state and control spaces are discrete, i.e., they are finite sets, the stochastic kernel $T(y|x,u)$ denotes the transition probability from state $x\in\mathbb{X}$ to state $y\in\mathbb{X}$ under control action $u\in\mathbb{U}_{x}$ , which satisfies that $\sum_{y\in\mathbb{X}}T(y|x,u)=1$ , $\forall x\in\mathbb{X}$ and $u\in\mathbb{U}_{x}$ .

In this case, according to Theorem 1 of [28], we can exactly compute $V_{0,\mathbb{P}_{i}}^{*}(x)$ via an LP. Moreover, the existence of the optimal Markov policy can be always guaranteed.

Lemma 4

Given any set $\mathbb{P}_{i}\subset\mathbb{X}$ , the value functions $V_{k,\mathbb{P}_{i}}^{*}$ in (1) can be obtained by solving an LP:

[TABLE]

which gives $V^{*}_{k,\mathbb{P}_{i}}(x)=v^{*}_{k}(x)$ , $\forall x\in\mathbb{P}_{i}$ and $\forall k\in\mathbb{N}_{[0,N]}$ , where $v^{*}_{k}$ is the optimal solution of (4). The optimal Markov policy $\bm{\mu}_{\mathbb{P}_{i}}^{*}=(\mu_{0,\mathbb{P}_{i}}^{*},\mu^{*}_{1,\mathbb{P}_{i}},\ldots,\mu^{*}_{N-1,\mathbb{P}_{i}})$ is given by $\mu^{*}_{k,\mathbb{P}_{i}}(x)=u$ where $u\in\mathbb{U}_{x}$ is such that

[TABLE]

Proof:

See Theorem 1 in [28] for the proof. ∎

Corollary 1

For discrete state and control spaces, Algorithm 1 converges in a finite number of iterations. Furthermore, at each iteration, the $N$ -step invariance probability $V^{*}_{0,\mathbb{P}_{i}}(x)$ , $\forall x\in\mathbb{P}_{i}$ , can be computed via the LP (4) and the corresponding optimal policy is determined by (5).

Proof:

The finite-step convergence of Algorithm 1 follows from Theorem 1 and the finite cardinality of $\mathbb{Q}$ . The remaining part follows from Lemma 4. ∎

Remark 4

When implementing Algorithm 1 to a system with discrete spaces, the maximal number of iterations is $|\mathbb{Q}|$ . At each iteration, an LP is solved to compute the value of $V^{*}_{0,\mathbb{P}_{i}}(x)$ , $\forall x\in\mathbb{P}_{i}$ . The number of the decision values in the LP is at most $|\mathbb{Q}|(N+1)$ and the number of constraints is at most $|\mathbb{Q}|(N|\mathbb{U}|+1)$ . It follows from [29] that Algorithm 1 can be implemented in $O(|\mathbb{Q}|^{2}(N|\mathbb{U}|+1))$ time.

III-A2 Continous state and control action spaces

In order to preserve the computational tractability of $V^{*}_{0,\mathbb{P}_{i}}$ and the finite-step convergence of Algorithm 1, if the state and control spaces are both continuous, we first discretize the spaces with convergence guarantee. Then, we adapt Algorithm 1 to compute an approximate $N$ -step $\epsilon$ -PCIS within a given set.

Assume that $\mathbb{X}\subseteq\mathbb{R}^{n_{x}}$ and $\mathbb{U}\subseteq\mathbb{R}^{n_{u}}$ for some $n_{x},n_{u}\in\mathbb{N}$ . For simplicity, we use Euclidean metric for the spaces $\mathbb{X}$ and $\mathbb{U}$ . For any $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ , we define $\phi(\mathbb{Q})=Leb(\mathbb{Q})$ where $Leb(\cdot)$ denotes the Lebesgue measure of sets. We suppose that the stochastic kernel $T(\cdot|x,u)$ admits a density $t(y|x,u)$ , which represents the probability density of $y$ given the current state $x$ and the control action $u$ .

Now we consider Problem 1, where we assume that the given set $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ is compact, which implies that $\phi(\mathbb{Q})$ is bounded. We further suppose that the density function satisfies the following assumption.

Assumption 3

There exists a constant $L$ such that for any $x,x^{\prime},y,y^{\prime}\in\mathbb{Q}$ , and $u,u^{\prime}\in\mathbb{U}$ ,

[TABLE]

Discretization

We discretize the compact set $\mathbb{Q}\subset\mathbb{X}$ into $m_{x}$ pair-wise disjoint nonempty Borel sets $\mathbb{Q}_{i}$ , $i\in\mathbb{N}_{[1,m_{x}]}$ , i.e., $\mathbb{Q}=\cup_{i=1}^{m_{x}}\mathbb{Q}_{i}$ . We pick a representative state from each set $\mathbb{Q}_{i}$ , denoted by $q_{i}$ . Let $\hat{\mathbb{Q}}=\{q_{i},i\in\mathbb{N}_{[1,m_{x}]}\}$ , $d_{i}=\sup_{x,y\in\mathbb{Q}_{i}}\|x-y\|$ , and $D_{x}=\max_{i\in\mathbb{N}_{[1,m_{x}]}}d_{i}$ .

Similarly, the compact control space $\mathbb{U}$ is divided into $m_{u}$ pair-wise disjoint nonempty Borel sets $\mathbb{C}_{i}$ , $i\in\mathbb{N}_{[1,m_{u}]}$ , i.e., $\mathbb{U}=\cup_{i=1}^{m_{u}}\mathbb{C}_{i}$ . We pick a representative element from the set $\mathbb{C}_{i}$ , denoted by $\hat{u}_{i}$ . Let $\hat{\mathbb{U}}=\{\hat{u}_{i},i\in\mathbb{N}_{[1,m_{u}]}\}$ , $l_{i}=\sup_{x,y\in\mathbb{C}_{i}}\|x-y\|$ , and $D_{u}=\max_{i\in\mathbb{N}_{[1,m_{u}]}}l_{i}$ .

Let the grid size be a constant $\delta\geq\max\{D_{x},D_{u}\}$ . For each $x\in\mathbb{Q}$ , define the set of admissible discrete control actions as

[TABLE]

where $s_{x}$ is the representative state of $\mathbb{Q}_{i}$ to which $x$ belongs, i.e., $s_{x}=q_{i}$ if $x\in\mathbb{Q}_{i}$ . Following [25], the following lemma shows that each $x\in\mathbb{Q}$ has a nonempty admissible discretized control set.

Lemma 5

For each $q_{i}\in\hat{\mathbb{Q}}$ , the set $\hat{\mathbb{U}}_{q_{i}}$ is nonempty and $\hat{\mathbb{U}}_{x}=\hat{\mathbb{U}}_{q_{i}}$ , $\forall x\in\mathbb{Q}_{i}$ .

Proof:

Since the admissible control set $\mathbb{U}_{s_{x}}$ is nonempty, $\forall x\in\mathbb{Q}$ , there exists $\hat{u}\in\hat{\mathbb{U}}$ such that $\|u-\hat{u}\|\leq\delta$ , $\forall u\in\mathbb{U}_{s_{x}}$ . Hence, by the definition of $s_{x}$ , we have that the set $\hat{\mathbb{U}}_{q_{i}}$ is nonempty for each $q_{i}\in\hat{\mathbb{Q}}$ . Furthermore, from (6), it is easy to obtain that $\hat{\mathbb{U}}_{x}=\hat{\mathbb{U}}_{q_{i}}$ , $\forall x\in\mathbb{Q}_{i}$ . ∎

As in [25], let us define the function $\hat{t}:\mathbb{Q}\times\mathbb{Q}\times\hat{\mathbb{U}}\rightarrow\mathbb{R}$

[TABLE]

From (7), we observe that all states $y\in\mathbb{Q}_{i}$ enjoy the same stochastic kernel. An approximate stochastic control system is given by a triple $\hat{\mathcal{S}}_{\mathbb{Q}}=(\hat{\mathbb{Q}},\hat{\mathbb{U}},\hat{T})$ . Here the transition probability $\hat{T}(q_{j}|q_{i},\hat{u})$ is defined by $\hat{T}(q_{j}|q_{i},\hat{u})=\int_{\mathbb{Q}_{j}}\hat{t}(y|q_{i},\hat{u})dy$ , where $q_{i},q_{j}\in\hat{\mathbb{Q}}$ with $q_{i}\in\mathbb{Q}_{i}$ and $q_{j}\in\mathbb{Q}_{j}$ , and $\hat{u}\in\hat{\mathbb{U}}$ .

Approximation of PCISs

For the approximate system $\hat{\mathcal{S}}_{\mathbb{Q}}$ , the discretized version of the DP (1) is given by

[TABLE]

For each $x\in\mathbb{Q}_{i}$ , $\hat{V}^{*}_{k,\mathbb{Q}}(x)=\hat{V}_{k,\mathbb{Q}}^{*}(q_{i}),\forall k\in\mathbb{N}_{[0,N]}$ . We define the discretized optimal Markov policy $\hat{\bm{\mu}}_{\mathbb{Q}}^{*}=(\hat{\mu}^{*}_{0,\mathbb{Q}},\ldots,\hat{\mu}^{*}_{N-1,\mathbb{Q}})$ as

[TABLE]

For each $x\in\mathbb{Q}_{i}$ , $\hat{\mu}^{*}_{k,\mathbb{Q}}(x)=\hat{\mu}^{*}_{k,\mathbb{Q}}(q_{i}),\ \forall k\in\mathbb{N}_{[0,N-1]}$ .

Remark 5

Since the state and control action spaces of the approximated system $\hat{\mathcal{S}}$ are finite, the value of $\hat{V}^{*}_{k,\mathbb{Q}}$ can be computed via the LP (4) and the corresponding optimal policy can be determined by (5). In addition, all the states in each $\mathbb{Q}_{i}$ share the same approximate $N$ -step invariance probability and optimal policy as the representative state $q_{i}\in\mathbb{Q}_{i}$ .

Lemma 6

Under Assumptions 1 and 3, the functions $V^{*}_{k,\mathbb{Q}}(x)$ and $\hat{V}^{*}_{k,\mathbb{Q}}(x)$ satisfy that $\forall x\in\mathbb{Q}$ ,

[TABLE]

where

[TABLE]

Proof:

See Appendix B. ∎

Remark 6

Lemma 6 guarantees convergence as the grid size tends to zero and generalizes the case considered in [23], which only discretizes the state space for a given finite control space. To prove Lemma 6, we need to show that (i) the value functions in (1) are Lipschitz continuous (Lemma 8), which is similar to Theorem 8 in [23], and (ii) the difference between the approximate density function and the original density function is bounded (Lemma 9), which is different from that in [23].

Theorem 2

Let Assumptions 1 and 3 hold. Consider a compact set $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ and a corresponding discretized set $\hat{\mathbb{Q}}$ of $\mathbb{Q}$ . If $\hat{\mathbb{Q}}$ is an $N$ -step $\hat{\epsilon}$ -PCIS for the approximate system $\hat{\mathcal{S}}_{\mathbb{Q}}=(\hat{\mathbb{Q}},\hat{\mathbb{U}},\hat{T})$ , and $\hat{\epsilon}\geq\tau_{0}(\mathbb{Q})\delta$ , the set $\mathbb{Q}$ is an $N$ -step $\epsilon$ -PCIS for the system $\mathcal{S}$ , where $\epsilon=\hat{\epsilon}-\tau_{0}(\mathbb{Q})\delta$ .

Proof:

According to the construction of the discretized system $\hat{\mathcal{S}}_{\mathbb{Q}}$ , we have that $\forall k\in\mathbb{N}_{[0,N]}$ , $\forall i\in\mathbb{N}_{[1,m_{x}]}$ and $\forall x\in\mathbb{Q}_{i}$ , $\hat{V}^{*}_{k,\mathbb{Q}}(x)=\hat{V}^{*}_{k,\mathbb{Q}}(q_{i})$ . Since $\hat{\mathbb{Q}}$ is an $N$ -step $\hat{\epsilon}$ -PCIS, it follows that $\forall x\in\mathbb{Q}$ , $\hat{V}^{*}_{0,\mathbb{Q}}(x)\geq\hat{\epsilon}$ . By Lemma 6 and triangular inequality, we have

[TABLE]

Then, when $\hat{\epsilon}\geq\tau_{0}(\mathbb{Q})\delta$ , we conclude that the set $\mathbb{Q}$ is an $N$ -step $\epsilon$ -PCIS where $0\leq\epsilon=\hat{\epsilon}-\tau_{0}(\mathbb{Q})\delta$ . ∎

Remark 7

From Theorem 2, if $0\leq\epsilon<1$ , by choosing a suitable grid size $0<\delta\leq\frac{1-\epsilon}{\tau_{0}(\mathbb{Q})}$ , the problem of computing an $N$ -step $\epsilon$ -PCIS within $\mathbb{Q}$ for $\mathcal{S}$ can be transformed into that of computing an approximate $N$ -step $\hat{\epsilon}$ -PCIS with probability $\hat{\epsilon}\geq\epsilon+\tau_{0}(\mathbb{Q})\delta$ for $\hat{\mathcal{S}}_{\mathbb{Q}}$ .

Computation algorithm

Assume that a probability level $0\leq\epsilon<1$ is given. After discretizing the set $\mathbb{Q}$ and the control space $\mathbb{U}$ , we modify Algorithm 1 to compute an $N$ -step $\epsilon$ -PCIS $\tilde{\mathbb{Q}}\subseteq\mathbb{Q}$ , as shown in the following.

In Algorithm 2, we first construct an approximate system $\hat{\mathcal{S}}_{\mathbb{Q}}=(\hat{\mathbb{Q}},\hat{\mathbb{U}},\hat{T})$ with grid size $0<\delta<\frac{1-\epsilon}{\tau_{0}(\mathbb{Q})}$ . Then, following similar steps as in Algorithm 1, we compute the stochastic backward reachable set iteratively for the system $\hat{\mathcal{S}}_{\mathbb{Q}}$ . At each iteration, an LP is solved to obtain the $N$ -step invariance probability. One difference is that the stochastic backward reachable set is computed with respect to $\hat{\epsilon}=\epsilon+\tau_{0}(\mathbb{P}_{i})\delta$ and the updated set for the system $\mathcal{S}$ is the union of the subsets of $\mathbb{Q}$ corresponding to the stochastic backward reachable set. By Theorem 2, the resulting set by Algorithm 2 is an $N$ -step $\epsilon$ -PCIS.

Corollary 2

Let Assumptions 1 and 3 hold. For continuous state and control spaces, Algorithm 2 converges in a finite number of iterations and generates an $N$ -step $\epsilon$ -PCIS. Furthermore, at each iteration, the $N$ -step invariance probability $\hat{V}^{*}_{0,\mathbb{P}_{i}}(q_{j})$ , $\forall q_{j}\in\hat{\mathbb{P}}_{i}$ , can be computed via the LP (4) and the corresponding optimal policy is determined by (5).

Proof:

By Theorem 2 and the Borel measurability of the subsets $\mathbb{Q}_{i},\forall i\in\mathbb{N}_{[1,m_{x}]}$ , it follows that the set generated by Algorithm 2 is an $N$ -step $\epsilon$ -PCIS. The remaining part is similar to the proof of Corollary 1. ∎

Remark 8

When implementing Algorithm 2 to a system with continuous spaces, it follows from [29] that Algorithm 2 can be implemented in $O(m^{2}_{x}(Nm_{u}+1))$ time, cf. Remark 4.

IV Extension to Infinite-horizon $\epsilon$ -PCIS

Now let us extend finite-horizon $\epsilon$ -PCISs to infinite-horizon $\epsilon$ -PCISs. In this section, we define the infinite-horizon $\epsilon$ -PCIS and explore the conditions of its existence. Furthermore, we provide algorithms to compute an infinite-horizon $\epsilon$ -PCIS within a given set.

Definition 4

(Infinite-horizon PCIS) Consider a stochastic control system $\mathcal{S}=(\mathbb{X},\mathbb{U},T)$ . Given a confidence level $0\leq\epsilon\leq 1$ , a set $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ is an infinite-horizon $\epsilon$ -PCIS for $\mathcal{S}$ if for any $x\in\mathbb{Q}$ , there exists at least one stationary Markov policy $\bm{\mu}\in\mathcal{M}$ such that $p_{\infty,\mathbb{Q}}^{\bm{\mu}}(x)\geq\epsilon$ .

We define the stochastic backward reachable set $\mathbb{S}^{*}_{\epsilon,\infty}(\mathbb{Q})$ by collecting all the states $x\in\mathbb{Q}$ at which the infinite-horizon invariance probability $p^{*}_{\infty,\mathbb{Q}}(x)\geq\epsilon$ , i.e.,

[TABLE]

For the infinite-horizon case, Lemma 3 and Proposition 1 still hold. That is, the set $\mathbb{S}^{*}_{\epsilon,\infty}(\mathbb{Q})$ is universally measurable and for any $p\in\mathcal{P}(\mathbb{S})$ , there exists another Borel-measurable set $\tilde{\mathbb{S}}^{*}_{\epsilon,\infty}(\mathbb{Q})\subseteq\mathbb{Q}$ such that $p(\tilde{\mathbb{S}}^{*}_{\epsilon,\infty}(\mathbb{Q})\bigtriangleup\mathbb{S}^{*}_{\epsilon,\infty}(\mathbb{Q}))=0$ .

Under Assumption 2, by Lemma 2 and the definition of $\mathbb{S}^{*}_{\epsilon,\infty}(\mathbb{Q})$ , we can verify whether a set $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ is an infinite-horizon $\epsilon$ -PCIS or not by checking if either $\mathbb{S}^{*}_{\epsilon,\infty}(\mathbb{Q})=\mathbb{Q}$ , or $G_{\infty,\mathbb{Q}}^{*}(x)\geq\epsilon$ , $\forall x\in\mathbb{Q}$ , where $G_{\infty,\mathbb{Q}}^{*}(x)$ is defined by (2)–(3).

Definition 5

Consider a stochastic control system $\mathcal{S}=(\mathbb{X},\mathbb{U},T)$ . An RCIS $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ for $\mathcal{S}$ is an $N$ -step $\epsilon$ -PCIS with $N=1$ and $\epsilon=1$ .

Remark 9

Another interpretation of RCIS in Definition 5 is that a set $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ is an RCIS if for any $x\in\mathbb{Q}$ , there exists at least one control input $u\in\mathbb{U}$ such that $T(\mathbb{Q}|x,u)=1$ . It is easy to verify that an RCIS is also an infinite-horizon $\epsilon$ -PCIS with $\epsilon=1$ . It is called an absorbing set in [30] where there is no control input. In the following, we show that the RCIS plays an important role in the existence of infinite-horizon PCIS and provide how to design an algorithm to compute such PCIS based on RCIS.

Remark 10

Note that infinite-horizon $\epsilon$ -PCISs are also closed under union, as shown in Proposition 2 when $N$ is replaced by $\infty$ .

IV-A Existence of infinite-horizon PCIS

Intuitively, the monotone decrease of $G^{*}_{\infty,\mathbb{Q}}(x)$ may imply that the value of $G^{*}_{\infty,\mathbb{Q}}(x)$ is one or zero. However, it is possible to get $0<G^{*}_{\infty,\mathbb{Q}}(x)<1$ in some cases (see Examples 1 and 2 in Section V). The following theorem provides necessary conditions and sufficient conditions for the existence of infinite-horizon $\epsilon$ -PCIS with $\epsilon>0$ .

Theorem 3

Suppose that Assumption 2 holds and let $0<\epsilon\leq 1$ be fixed. Given a nonempty set $\mathbb{Q}$ , let $u_{x}$ be the control input such that (3) holds for each $x\in\mathbb{Q}$ . The set $\mathbb{Q}$ is an infinite-horizon $\epsilon$ -PCIS

(i)

only if* there exists an RCIS $\mathbb{Q}_{f}\subseteq\mathbb{Q}$ such that $\forall x\in\mathbb{Q}\setminus\mathbb{Q}_{f}$ ,*

[TABLE]

where $\rho=\sup_{x\in\mathbb{Q}\setminus\mathbb{Q}_{f}}\int_{\mathbb{Q}\setminus\mathbb{Q}_{f}}T(dy|x,u_{x})$ ;

(ii)

*if *there exists an RCIS $\mathbb{Q}_{f}\subseteq\mathbb{Q}$ such that $\forall x\in\mathbb{Q}\setminus\mathbb{Q}_{f}$ ,

[TABLE]

Proof:

See Appendix C. ∎

Remark 11

The value of $\rho$ is the largest probability that the next state $y$ remains outside the RCIS $\mathbb{Q}_{f}$ from any $x\in\mathbb{Q}\setminus\mathbb{Q}_{f}$ under the optimal stationary Markov policy in Lemma 2. Note that $\frac{\rho^{2}}{1-\rho}$ is the gap between the necessary condition and the sufficient condition. In addition, the second item in ((i))–(11) denotes the probability that the state is steered into the RCIS $\mathbb{Q}_{f}$ by two transitions from $x\in\mathbb{Q}\setminus\mathbb{Q}_{f}$ with an intermediate state $y$ outside $\mathbb{Q}_{f}$ .

Corollary 3

Suppose that Assumption 2 holds and let $0<\epsilon\leq 1$ be fixed. A nonempty set $\mathbb{Q}$ is an infinite-horizon $\epsilon$ -PCIS

(i)

only if* there exists an RCIS $\mathbb{Q}_{f}\subseteq\mathbb{Q}$ such that $\forall x\in\mathbb{Q}\setminus\mathbb{Q}_{f}$ , $T(\mathbb{Q}|x,u)\geq\epsilon$ for some $u\in\mathbb{U}$ ;*

(ii)

*if *there exists an RCIS $\mathbb{Q}_{f}\subseteq\mathbb{Q}$ such that $\forall x\in\mathbb{Q}\setminus\mathbb{Q}_{f}$ , $T(\mathbb{Q}_{f}|x,u)+\epsilon T(\mathbb{Q}\setminus\mathbb{Q}_{f}|x,u)\geq\epsilon$ for some $u\in\mathbb{U}$ .

Proof:

See Appendix D. ∎

Remark 12

A nonempty set $\mathbb{Q}$ is an infinite-horizon $\epsilon$ -PCIS if there exists an RCIS $\mathbb{Q}_{f}\subseteq\mathbb{Q}$ such that $\forall x\in\mathbb{Q}\setminus\mathbb{Q}_{f}$ , $T(\mathbb{Q}_{f}|x,u)\geq\epsilon$ for some $u\in\mathbb{U}$ . This implication will facilitate the design of an algorithm for an infinite-horizon $\epsilon$ -PCIS, see Algorithm 4.

Remark 13

Considering the similarity between the reliability defined in [11] and the infinite-horizon invariance probability in this paper, we can extend the results on infinite-horizon PICSs, including the existence condition above and the computational algorithms in the following, to the reliable control set in [10] to general stochastic systems.

IV-B Infinite-horizon $\epsilon$ -PCIS computation

This subsection will address the following problem.

Problem 2

Given a set $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ and a prescribed probability $0\leq\epsilon\leq 1$ , compute an infinite-horizon $\epsilon$ -PCIS $\tilde{\mathbb{Q}}\subseteq\mathbb{Q}$ .

To handle this problem, the key point is to compute the infinite-horizon invariance probability $G^{*}_{\infty,\mathbb{Q}}$ . For discrete spaces, it is shown that computationally tractable MILP can be used to compute the exact value of $G^{*}_{\infty,\mathbb{Q}}$ . In this case, we can compute the largest infinite-horizon $\epsilon$ -PCIS by computing iteratively the stochastic backward reachable sets until convergence. For continuous spaces, it is in general computationally intractable to compute $G^{*}_{\infty,\mathbb{Q}}$ and the discretization method fails to work since the approximation error in (8) increases with the horizon. In this case, we design another computational algorithm based on the sufficient conditions in Remark 12.

IV-B1 Discrete state and control spaces

If the state and control spaces are discrete, we adopt the same assumptions as in Section III-A1. We will first show how to compute the exact value of $G^{*}_{\infty,\mathbb{Q}}$ in (2)–(3) through an MILP. Then, we will adapt Algorithm 1 to compute the largest infinite-horizon $\epsilon$ -PCIS within a given set.

MILP reformulation

Since [math] is a trivial solution of (3), we cannot directly reformulate (2)–(3) as an LP, which is the traditional way to deal with infinite-horizon stochastic optimal control problems [31].

The following lemma provides a computationally tractable MILP reformulation when computing $G^{*}_{\infty,\mathbb{Q}}$ .

Lemma 7

Given any set $\mathbb{Q}\subseteq\mathbb{X}$ , the value of $G^{*}_{\infty,\mathbb{Q}}$ in (3) can be obtained by solving the MILP:

[TABLE]

where $\Delta$ is a constant greater than one. That is, $G^{*}_{\infty,\mathbb{Q}}(x)=g^{*}(x)$ , $\forall x\in\mathbb{Q}$ , where $g^{*}$ is the optimal solution of the MILP (12). The optimal stationary Markov policy is $\bar{\mu}^{*}_{\mathbb{Q}}(x)=u$ where $u\in\mathbb{U}_{x}$ such that $\kappa^{*}(x,u)=1$ and $\kappa^{*}$ is the optimal solution of the MILP (12).

Proof:

From the monotone decrease of the sequence $(G^{*}_{0,\mathbb{Q}},G^{*}_{1,\mathbb{Q}},\ldots)$ and Lemma 2, $G^{*}_{\infty,\mathbb{Q}}$ is the maximum fixed point satisfying (3). Hence, the equivalent form of $G^{*}_{\infty,\mathbb{Q}}$ can be written as MILP (12), where the constraints (12b)–(12d) guarantee that there exists $u\in\mathbb{U}_{x}$ such that the equality in (3) holds. ∎

Computational algorithm

As an adaption of Algorithm 1, the following algorithm provides a way to compute the largest infinite-horizon $\epsilon$ -PCIS within $\mathbb{Q}$ .

The difference between Algorithms $1$ and $3$ is that the value of $G^{*}_{\infty,\mathbb{P}_{i}}(x)$ , instead of $V^{*}_{0,\mathbb{P}_{i}}(x)$ , $\forall x\in\mathbb{P}_{i}$ , is computed by (12) (replacing $\mathbb{Q}$ with $\mathbb{P}_{i}$ ). Furthermore, the updated set $\mathbb{P}_{i+1}=\mathbb{S}^{*}_{\epsilon,\infty}(\mathbb{P}_{i})$ , which is a stochastic backward reachable set within $\mathbb{P}_{i}$ with respect to infinite horizon and a probability level $\epsilon$ . The following theorem provides the convergence of $\mathbb{P}_{i}$ and shows that the resulting set $\tilde{\mathbb{Q}}$ by this algorithm is an infinite-horizon $\epsilon$ -PCIS.

Theorem 4

For discrete state and control spaces, Algorithm 3 converges in a finite number of iterations and generates the largest infinite-horizon $\epsilon$ -PCIS within $\mathbb{Q}$ . Furthermore, at each iteration, the infinite-horizon invariance probability $G^{*}_{\infty,\mathbb{P}_{i}}(x)$ , $\forall x\in\mathbb{P}_{i}$ , can be computed via the MILP (12).

Proof:

The finite-step convergence of Algorithm 3 follows from the finite cardinality of the set $\mathbb{Q}$ . Similar to Theorem 1, the generated infinite-horizon $\epsilon$ -PCIS is the largest one within $\mathbb{Q}$ . The MILP reformulation refers to Lemma 7. ∎

Remark 14

When implementing Algorithm 3 to a system with discrete spaces, the maximal iteration number is $|\mathbb{Q}|$ . An MILP is used to compute the value of $G^{*}_{\infty,\mathbb{P}_{i}}(x)$ , $\forall x\in\mathbb{P}_{i}$ , at each iteration. The number of real-valued decision values is at most $|\mathbb{Q}|$ , the number of binary decision values is at most $|\mathbb{Q}||\mathbb{U}|$ , and the number of constraints is at most $|\mathbb{Q}|(2|\mathbb{U}|+3)$ . In general, MILPs are NP-hard and can be solved by cutting plane algorithm or branch-and-bound algorithm [32]. Some advanced softwares have been developed to solve large MILPs efficiently [33, 34].

IV-B2 Continuous state and control spaces

If the state and control spaces are continuous, it is computationally intractable to compute the exact value of infinite-horizon invarinace probability $G^{*}_{\infty,\mathbb{Q}}(x)$ . Based on Remark 12, this subsection provides another way to compute an infinite-horizon $\epsilon$ -PCIS within a given set $\mathbb{Q}$ .

Different from Algorithm 3, which computes iteratively the stochastic backward reachable sets, the following algorithm generates an infinite-horizon $\epsilon$ -PCIS by computing a backward stochastic reachable set from the RCIS $\mathbb{Q}_{f}$ contained in $\mathbb{Q}$ .

The first step in Algorithm 4 is the computation of RCIS within a given set, which is a well-studied topic in the literature [4, 5, 6]. Then, based on RCIS $\mathbb{Q}_{f}$ within $\mathbb{Q}$ , the stochastic backward reachable set

[TABLE]

is an infinite-horizon $\epsilon$ -PCIS within $\mathbb{Q}$ . In comparision with Algorithms 1–3, the iteration is avoided in Algorithm 4, which only needs two steps.

Remark 15

Note that the resulting set from Algorithm 4 is in general not the largest infinite-horizon $\epsilon$ -PCIS within the given set $\mathbb{Q}$ . It is possible to obtain a larger infinite-horizon $\epsilon$ -PCIS if we can reformulate the existence conditions in Theorem 3 and Corollary 3 in a recursive form and thereby modify Algorithm 4 to be a recursive algorithm.

Remark 16

The complexity of Algorithm 4 depends on the computation of the RCIS [3, 4, 5, 6], and the computation of the backward stochastic reachable set. The later can be reformulated as a chance-constrained problem and then approximately solved. Some results on computation of the backward stochastic reachable set have been reported in [35]. The first example in Section V will show how to compute the backward stochastic reachable set.

V Examples

In this section, two examples are provided to illustrate the effectiveness of the proposed theoretical results. The first one is concerned with comparison between PCIS and RCIS. Then we consider an application to motion planning of a mobile robot in a partitioned space with obstacles.

V-A Example 1: Comparison between PCIS and RCIS

Consider the following example from [36]:

[TABLE]

where $A=\left[\begin{array}[]{ccccc}1.6&1.1\\ -0.7&1.2\end{array}\right]$ and $B=\left[\begin{array}[]{ccccc}1\\ 1\end{array}\right]$ . The control input is constrained by $|u_{k}|\leq 0.25$ . We consider $w_{k}$ to be either non-stochastic or stochastic when computing RCIS and PCIS, respectively. The region of interest is $\mathbb{Q}=\{x\in\mathbb{R}^{2}\mid\|x\|_{\infty}\leq 0.5\}$ . We will compare the largest RCIS and PCIS within $\mathbb{Q}$ .

To derive an RCIS for this system, we assume the disturbance belongs to the compact set $\mathbb{W}=\{w\in\mathbb{R}^{2}\mid\|w\|_{\infty}\leq 0.05\}$ . By using the methods in [1, 6], we obtain the largest RCIS, which is the blue region shown in Fig. 1. The gray region is an infinite-horizon $\epsilon$ -PCIS described in the end of this example.

When computing a finite-horzion PCIS, assume that elements of $w_{k}$ are i.i.d. Gaussian random variables with zero mean and variance $\sigma^{2}=1/30^{2}$ . This system can be represented as a triple $\mathcal{S}=\{\mathbb{X},\mathbb{U},T\}$ :

[TABLE]

where $\psi(\cdot)$ is the density function of the standard normal distribution and $\Lambda=\rm{diag}\{\sigma,\sigma\}$ . In this case, since the Lipschitz constant $L$ in Assumption 3 is small, we ignore the approximation error $\tau_{0}$ in (9). We discretize the continuous spaces and implement Algorithm $2$ to compute the $N$ -step $\epsilon$ -PCIS $\tilde{\mathbb{Q}}$ . First consider $N=5$ and $\epsilon=0.80$ . Fig. 2(a) shows the evolution of the set $\mathbb{P}_{i}$ in Algorithm 2. The color indicates the corresponding $N$ -step invariance probability $p^{*}_{N,\mathbb{P}_{i}}(x)$ and the $z$ -axes the iteration index $i$ . The algorithm converges in $8$ steps. Fig. 2(b) shows $\mathbb{P}_{8}$ , which corresponds to the $N$ -step $\epsilon$ -PCIS $\tilde{\mathbb{Q}}$ for $N=5$ and $\epsilon=0.80$ .

When computing an infinite-horizon PCIS, we choose the same bound on the disturbance as for the RCIS. The elements of $w_{k}$ are truncated i.i.d. Gaussian random variables with zero mean and variance $\sigma^{2}=1/30^{2}$ . Denote the largest RCIS computed above by $\mathbb{Q}_{f}=\{x\in\mathbb{R}^{2}\mid Hx\leq h\}$ , where the matrix $H$ and the vector $h$ are with appropriate dimensions. As stated in Algorithm $4$ , the one-step stochastic backward reachable set from the RCIS associated with probability $0.80$ is an infinite-horizon $\epsilon$ -PCIS with $\epsilon=0.80$ , i.e.,

[TABLE]

This set can be represented as

[TABLE]

where $h^{\prime}$ is the optimal solution of the chance constrained program

[TABLE]

This program can be numerically solved by using the methods in [37, 38]. The resulting infinite-horizon $\epsilon$ -PCIS with $\epsilon=0.80$ is the gray region shown in Fig. 1. This region is obviously a superset of the RCIS in blue.

V-B Example 2: Motion planning

The motion planning example in [39] is adapted to seek an infinite-horizon PCIS within the workspace for a mobile robot. The state of the robot is abstracted by its cell coordinate, i.e., $(p_{x},p_{y})\in\{1,2,3,4\}^{2}$ , and its four possible orientations $\{\mathcal{E},\mathcal{W},\mathcal{S},\mathcal{N}\}$ . Due to the actuation noise and drifting, the robot motion is stochastic. Here, we restrict the action space to be $\{\rm{FR},\rm{BK},\rm{TRFR},\rm{TLFR}\}$ , under which the possible transitions are shown in Fig. 3. Specifically, action “ $\rm{FR}$ ” means driving forward for $1$ unit. As illustrated in the figure, the probability for that is $0.80$ . The probability of drifting forward to the left or the right by $1$ unit is $0.10$ . Action “ $\rm{BK}$ ” can be similarly defined. Action “ $\rm{TRFR}$ ” means turning right $\pi/2$ and driving forward for $1$ unit, of which the probability is $0.95$ . The probability of driving forward for $1$ unit without turning right is $0.025$ and the probability of turning right for $\pi$ and driving forward for $1$ unit is $0.025$ . Similarly, we can define the action “ $\rm{TLFR}$ ”.

Consider the partitioned workspace shown in Fig. 4, where the shadowed cells are occupied by obstacles and the red cell is an absorbing region, i.e., when the robot enters in this region it will stay there forever. We construct an MDP with $64$ states and $4$ actions. The transition relation and probability can be defined based on the above description. We compute the largest infinite-horizon $\epsilon$ -PCIS with $\epsilon=0.90$ within the safe state space, i.e., the remaining of the state space by excluding the states associated with the obstacles.

By implementing Algorithm $3$ , the computed sets $\mathbb{P}_{i}$ and the corresponding infinite-horizon invariance probability $p^{*}_{\infty,\mathbb{P}_{i}}(x)$ are shown in Fig. 5, of which each subfigure corresponds to one orientation in $\{\mathcal{E},\mathcal{W},\mathcal{S},\mathcal{N}\}$ . The first row of Fig. 5 shows the results after the first iteration, where we can see that the infinite-horizon invariance probability $p^{*}_{\infty,\mathbb{P}_{i}}(x)$ at $x=(4,2,\mathcal{E})$ and $x=(4,2,\mathcal{W})$ is less than $\epsilon=0.90$ . Algorithm $3$ converges in $2$ steps and generates the largest infinite-horizon $\epsilon$ -PCIS $\tilde{\mathbb{Q}}$ with $\epsilon=0.90$ shown in Fig. 5(e)–5(h). This invariant set provides a region where the admissible action can drive the robot without colliding with the obstacles with probability $0.90$ . By implementing the optimal policy obtained in Lemma 7, we run a state trajectory starting from $(3,1,\mathcal{N})$ as shown in Fig. 4. We can see that this trajectory is collision-free and finally ends at the absorbing region $(3,3,\mathcal{S})$ .

VI Conclusion

We investigated the extension of set invariance in a stochastic sense for control systems. We proposed finite- and infinite-horizon $\epsilon$ -PCISs, and provided some fundamental properties. We designed iterative algorithms to compute the PCIS within a given set. For systems with discrete state and control spaces, finite- and infinite-horizon $\epsilon$ -PCISs can be computed by solving an LP and an MILP at each iteration, respectively. We proved that the iterative algorithms were computationally tractable and can be terminated in a finite number of steps. For systems with continuous state and control spaces, we established the approximation of stochastic control systems and proved its convergence when computing finite-horizon $\epsilon$ -PCIS. In addition, thanks to the sufficient conditions for the existence of infinite-horizon $\epsilon$ -PCIS, we can compute an infinite-horizon $\epsilon$ -PCIS by the stochastic backward reachable set from the RCIS contained in it. Numerical examples were given to illustrate the theoretical results.

One future direction is to apply the PCISs to safety-critical control and stochastic predictive control. In particular, how to characterize stability using PCISs is an important problem to consider. Another interesting future extension of PCISs is to study reliability and mean-time-to-failure for general stochastic systems.

Acknowledgment

The authors are grateful to Prof. Alessandro Abate for helpful discussions and feedback and to anonymous reviewers for their constructive comments.

Appendix A. Proof of Lemma 3

Define the functions $J^{*}_{k,\mathbb{Q}}:\mathbb{X}\rightarrow\mathbb{R}$ , $k\in\mathbb{N}_{[0,N]}$ , as

[TABLE]

As shown in [23], the function $J^{*}_{N,\mathbb{Q}}$ is lower-semianalytic for any $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ . From Definitions 7.20 and 7.21 in [26], we have that the function $J^{*}_{N,\mathbb{Q}}$ is also analytically measurable and thus is universally measurable for any $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ . According to the definition of universal measurability, the set $J^{*,-1}_{N,\mathbb{Q}}(\mathbb{B})=\{x\in\mathbb{X}\mid J^{*}_{k,\mathbb{Q}}(x)\in\mathbb{B}\}$ for $\mathbb{B}\in\mathcal{B}(\mathbb{R})$ is universally measurable.

Recall the definition of the stochastic backward reachable set $\mathbb{S}^{*}_{\epsilon,N}(\mathbb{Q})$ , we have that

[TABLE]

where $\mathbb{B}=[-1,-\epsilon]\in\mathcal{B}(\mathbb{R})$ . Thus, the set $\mathbb{S}^{*}_{\epsilon,N}(\mathbb{Q})$ is universally measurable for any $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ .

Appendix B. Proof of Lemma 6

Before proving Lemma 6, we need two auxiliary lemmas. Lemma 8 shows that the value functions in (1) are Lipschitz continuous. It is adapted from Theorem 8 in [23]. Lemma 9 shows that the difference between the approximate density function and the original density function is bounded.

Lemma 8

Under Assumptions 1 and 3, for any $x,x^{\prime}\in\mathbb{Q}$ , the value functions $V^{*}_{k,\mathbb{Q}}$ in (1) satisfy

[TABLE]

Proof:

Similar to Theorem 8 in [23]. ∎

Lemma 9

Under Assumptions 3, for all $y\in\mathbb{Q}$ and $q_{i}\in\hat{\mathbb{Q}}$ ,

[TABLE]

Proof:

If $\int_{\mathbb{Q}}t(s_{z}|s_{x},\hat{u})dz<1$ , it follows from Assumption 3 that

[TABLE]

And if $\int_{\mathbb{Q}}t(s_{z}|s_{x},\hat{u})dz\geq 1$ , we first have

[TABLE]

Furthermore, we have

[TABLE]

This completes the proof. ∎

Proof of Lemma 6: First of all, let us prove the inequality (8). It is easy to check it for $k=N$ since $V^{*}_{N,\mathbb{Q}}(x)=\hat{V}^{*}_{k,\mathbb{Q}}(x)=1,\forall x\in\mathbb{Q}$ . By induction, we assume that $|V^{*}_{k+1,\mathbb{Q}}(x)-\hat{V}^{*}_{k+1,\mathbb{Q}}(x)|\leq\tau_{k+1}(\mathbb{Q})\delta$ , $x\in\mathbb{Q}$ . For any $q_{i}\in\mathbb{Q}_{i}$ , $i\in\mathbb{N}_{[1,m_{x}]}$ , we define $\mu^{*}_{k}=\arg\sup_{u\in\mathbb{U}}\int_{\mathbb{Q}}V^{*}_{k+1,\mathbb{Q}}(y)t(y|q_{i},u)dy$ and $\hat{\mu}^{*}_{k}=\arg\max_{\hat{u}\in\hat{\mathbb{U}}}\int_{\mathbb{Q}}\hat{V}^{*}_{k+1,\mathbb{Q}}(y)\hat{t}(y|q_{i},\hat{u})dy$ . According to the dicretization procedure of the control space, we can choose some $\hat{\nu}_{k}\in\hat{\mathbb{U}}$ such that $\|\mu^{*}_{k}-\hat{\nu}_{k}\|\leq\delta$ . Then, we have that

[TABLE]

and

[TABLE]

Thus, we have

[TABLE]

For any $x\in\mathbb{Q}_{i}$ , $i\in\mathbb{N}_{[1,m_{x}]}$ , it follows that

[TABLE]

which completes the proof of the inequality (8).

Appendix C. Proof of Theorem 3

Let $u_{x}$ be the control input such that (3) holds for any $x\in\mathbb{Q}$ .

Only-if-part: Under Assumption 2, the fact that the set $\mathbb{Q}\in\mathcal{B}(\mathbb{X})$ is an infinite-horizon $\epsilon$ -PCIS is equivalent to $G^{*}_{\infty,\mathbb{Q}}(x)\geq\epsilon,\forall x\in\mathbb{Q}$ . Let $\theta=\sup_{x\in\mathbb{Q}}G^{*}_{\infty,\mathbb{Q}}(x)$ . Under Assumption 2, $G^{*}_{\infty,\mathbb{Q}}(x)$ exists for all $x\in\mathbb{Q}$ . The set $\tilde{\mathbb{Q}}_{f}=\{x\in\mathbb{Q}\mid G^{*}_{\infty,\mathbb{Q}}(x)=\theta\}$ collects all the states for which the value of $G^{*}_{\infty,\mathbb{Q}}$ is maximal over the set $\mathbb{Q}$ . Extending Lemma 3 to infinite-horizon case, we have that the set $\tilde{\mathbb{Q}}_{f}$ is universally measurable. By Lemma 7.16 in [26], we have that for any $p\in\mathcal{P}(\mathbb{X})$ , there exists a Borel-measurable set $\mathbb{Q}_{f}\subseteq\mathbb{Q}$ such that $p(\mathbb{Q}_{f}\bigtriangleup\tilde{\mathbb{Q}}_{f})=0$ .

Next we will show that the set $\mathbb{Q}_{f}$ is an RCIS. It follows from Assumption 2 and Lemma 2 that $\forall x\in\mathbb{Q}_{f}$ ,

[TABLE]

where Eq. (14) follows from $G^{*}_{\infty,\mathbb{Q}}(x)=G^{*}_{\infty,\mathbb{Q}}(y),\forall x,y\in\mathbb{Q}_{f}$ and Eq. (15) follows from that $G^{*}_{\infty,\mathbb{Q}}(x)>G^{*}_{\infty,\mathbb{Q}}(y),\forall x\in{Q}_{f},\forall y\in\mathbb{Q}\setminus\mathbb{Q}_{f}$ . Furthermore, since $G^{*}_{\infty,\mathbb{Q}}(x)\geq\epsilon>0,\forall x\in\mathbb{Q}$ , and $0\leq T(\mathbb{Q}|x,u_{x})\leq 1$ , the equality in Eq. (15) holds if and only if $T(\mathbb{Q}_{f}|x,u_{x})=1$ and thereby $T(\mathbb{Q}\setminus\mathbb{Q}_{f}|x,u_{x}))=0$ . Based on the recursion in (2), we have $G^{*}_{\infty,\mathbb{Q}}(x)=1,\forall x\in\mathbb{Q}_{f}$ . Hence, the set $\mathbb{Q}_{f}\subseteq\mathbb{Q}$ is an RCIS.

Next let us prove that $\forall x\in\mathbb{Q}\setminus\mathbb{Q}_{f}$ , Eq.((i)) holds. That is to prove that

[TABLE]

By Theorem 7 in [23], the control input $u_{x}$ is also optimal to the recursion (2). For all $k\in\mathbb{N}$ , we have $\forall x\in\mathbb{Q}_{f}$ , $G^{*}_{k,\mathbb{Q}}(x)=1$ and $\forall x\in\mathbb{Q}\setminus\mathbb{Q}_{f}$ ,

[TABLE]

Let $\rho=\sup_{x\in\mathbb{Q}\setminus\mathbb{Q}_{f}}\int_{\mathbb{Q}\setminus\mathbb{Q}_{f}}T(dy|x,u_{x})$ . Note that $0\leq\rho<1$ . Then, $\forall x\in\mathbb{Q}\setminus\mathbb{Q}_{f}$ , we can follow the induction rule to prove that

[TABLE]

which by taking limitation yields that (16) holds.

If-part: The proof for the existence of an RCIS $\mathbb{Q}_{f}\subseteq\mathbb{Q}$ is the same as that of the only if part. As shown above, the condition $T(\mathbb{Q}_{f}|x,u_{x})=1$ is equivalent to $G^{*}_{\infty,\mathbb{Q}}(x)=1,\forall x\in\mathbb{Q}_{f}$ . We can use induction to prove that $\forall x\in\mathbb{Q}\setminus\mathbb{Q}_{f}$ ,

[TABLE]

which further implies that $G^{*}_{\infty,\mathbb{Q}}(x)\geq T(\mathbb{Q}_{f}|x,u_{x})+\int_{\mathbb{Q}\setminus\mathbb{Q}_{f}}T(\mathbb{Q}_{f}|y,u_{y})T(dy|x,u_{x})$ . One sufficient condition to guarantee $G^{*}_{\infty,\mathbb{Q}}(x)\geq\epsilon$ is (11), i.e., $T(\mathbb{Q}_{f}|x,u_{x})+\int_{\mathbb{Q}\setminus\mathbb{Q}_{f}}T(\mathbb{Q}_{f}|y,u_{y})T(dy|x,u_{x})\geq\epsilon$ . The proof is completed.

Appendix D. Proof of Corollary 3

By Lemma 2 and Theorem 3, the necessary condition in Corollary 3 can be proven by showing that $\forall x\in\mathbb{Q}\setminus\mathbb{Q}_{f}$ , there exists a $u\in\mathbb{U}$ such that

[TABLE]

where Eq. (17) follows from $0<G^{*}_{\infty,\mathbb{Q}}(x)\leq 1,\forall x\in\mathbb{Q}$ .

The sufficient condition in Corollary 3 can be proven by showing that $\forall x\in\mathbb{Q}\setminus\mathbb{Q}_{f}$ , there exists a $u\in\mathbb{U}$

[TABLE]

where Eq. (18) follows from $G^{*}_{\infty,\mathbb{Q}}(x)\geq\epsilon>0,\forall x\in\mathbb{Q}$ . One sufficient condition to guarantee $G^{*}_{\infty,\mathbb{Q}}(x)\geq\epsilon$ is $T(\mathbb{Q}_{f}|x,u)+\epsilon T(\mathbb{Q}\setminus\mathbb{Q}_{f}|x,u)\geq\epsilon$ . The proof is completed.

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. Bertsekas, “Infinite time reachability of state-space regions by using feedback control,” IEEE Transactions on Automatic Control , vol. 17, no. 5, pp. 604–613, 1972.
2[2] F. Blanchini, “Set invariance in control,” Automatica , vol. 35, no. 11, pp. 1747–1767, 1999.
3[3] F. Blanchini and S. Miani, Set-theoretic methods in control . Springer, 2007.
4[4] S. V. Raković, E. C. Kerrigan, K. I. Kouramas, and D. Q. Mayne, “Invariant approximations of the minimal robust positively invariant set,” IEEE Transactions on Automatic Control , vol. 50, no. 3, pp. 406–410, 2005.
5[5] M. Rungger and P. Tabuada, “Computing robust controlled invariant sets of linear systems,” IEEE Transactions on Automatic Control , vol. 62, no. 7, pp. 3665–3670, 2017.
6[6] E. Gilbert and K. T. Tan, “Linear systems with state and control constraints: the theory and practice of maximal admissible sets,” IEEE Transactions on Automatic Control , vol. 36, no. 9, pp. 1008–1020, 1991.
7[7] I. M. Mitchell, S. Kaynama, M. Chen, and M. Oishi, “Safety preserving control synthesis for sampled data systems,” Nonlinear Analysis: Hybrid Systems , vol. 10, pp. 63–82, 2013.
8[8] A. Mesbah, “Stochastic model predictive control: an overview and perspectives for future research,” IEEE Control Systems , vol. 36, no. 6, pp. 30–44, 2016.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Computing Probabilistic Controlled Invariant Sets

Abstract

Index Terms:

I Introduction

I-A Motivation and Related Work

I-B Main Contributions and Organization

II System Description and Preliminaries

Definition 1

Remark 1

Remark 2

Assumption 1

Lemma 1

Definition 2

Assumption 2

Lemma 2

III Finite-Horizon ϵ\epsilonϵ-PCIS

Definition 3

Lemma 3

Proof:

Proposition 1

Proof:

Remark 3

Proposition 2

Proof:

III-A Finite-horizon ϵ\epsilonϵ-PCIS computation

Problem 1

Theorem 1

Proof:

III-A1 Discrete state and control spaces

Lemma 4

Proof:

Corollary 1

Proof:

Remark 4

III-A2 Continous state and control action spaces

Assumption 3

Discretization

Lemma 5

Proof:

Approximation of PCISs

Remark 5

Lemma 6

Proof:

Remark 6

Theorem 2

Proof:

Remark 7

Computation algorithm

Corollary 2

Proof:

Remark 8

IV Extension to Infinite-horizon ϵ\epsilonϵ-PCIS

Definition 4

Definition 5

Remark 9

Remark 10

IV-A Existence of infinite-horizon PCIS

Theorem 3

Proof:

Remark 11

Corollary 3

Proof:

Remark 12

Remark 13

IV-B Infinite-horizon ϵ\epsilonϵ-PCIS computation

Problem 2

IV-B1 Discrete state and control spaces

MILP reformulation

Lemma 7

Proof:

Computational algorithm

Theorem 4

Proof:

III Finite-Horizon $\epsilon$ -PCIS

III-A Finite-horizon $\epsilon$ -PCIS computation

IV Extension to Infinite-horizon $\epsilon$ -PCIS

IV-B Infinite-horizon $\epsilon$ -PCIS computation