Dual Formulation for Chance Constrained Stochastic Shortest Path with   Application to Autonomous Vehicle Behavior Planning

Rashid Alyassi; Majid Khonji

arXiv:2302.13115·cs.AI·February 28, 2023

Dual Formulation for Chance Constrained Stochastic Shortest Path with Application to Autonomous Vehicle Behavior Planning

Rashid Alyassi, Majid Khonji

PDF

TL;DR

This paper introduces a new exact integer linear programming approach for chance-constrained stochastic shortest path problems, enabling safe and efficient autonomous vehicle behavior planning under uncertainty.

Contribution

It presents a novel formulation for chance-constrained SSP, a randomized policy method, and extends the formalism to multi-step constraints, advancing safety-critical planning.

Findings

01

The approach outperforms existing methods on benchmark problems.

02

It provides deterministic policies for stochastic environments.

03

The method effectively bounds collision risk in autonomous navigation.

Abstract

Autonomous vehicles face the problem of optimizing the expected performance of subsequent maneuvers while bounding the risk of collision with surrounding dynamic obstacles. These obstacles, such as agent vehicles, often exhibit stochastic transitions that should be accounted for in a timely and safe manner. The Constrained Stochastic Shortest Path problem (C-SSP) is a formalism for planning in stochastic environments under certain types of operating constraints. While C-SSP allows specifying constraints in the planning problem, it does not allow for bounding the probability of constraint violation, which is desired in safety-critical applications. This work's first contribution is an exact integer linear programming formulation for Chance-constrained SSP (CC-SSP) that attains deterministic policies. Second, a randomized rounding procedure is presented for stochastic policies. Third, we…

Tables1

Table 1. TABLE I : The CC-SSP solvers comparison on two benchmark problems (with minimization objective functions).

Horizon

Δ

ILP

Objective

ILP Time

(sec)

Rounding

Objective

Rounding

Time (sec)

Graph-based

# nodes

Tree-based

# nodes

Grid Problem (10000x10000)

h = 10

5%

13.603

0.289

13.825

0.0637

506

5.36x

10^{9}

10%

12.958

0.2448

13.824

0.064

506

5.36x

10^{9}

h = 25

5%

30.975

8.983

31.743

4.492

6,201

8.67x

10^{25}

10%

29.414

11.315

31.743

4.257

6,201

8.67x

10^{25}

h = 30

5%

36.767

50.790

37.716

12.758

10,416

2.16x

10^{31}

10%

34.902

43.099

37.716

13.299

10,416

2.16x

10^{31}

h = 35

5%

42.555

126.046

43.687

29.072

16,206

5.37x

10^{36}

10%

40.386

101.244

43.687

41.808

16,206

5.37x

10^{36}

Highway Problem (Three lanes)

h = 4

5%

6.258

0.165

6.448

0.098

2,438

124,251

10%

5.854

0.148

5.854

0.090

2,438

124,251

h = 5

5%

7.343

1.274

7.642

0.356

7,666

1.99x

10^{6}

10%

6.833

0.655

6.833

0.337

7,666

1.99x

10^{6}

h = 6

5%

8.343

4.249

8.712

1.235

20,576

3.18x

10^{7}

10%

7.761

2.721

7.762

1.247

20,576

3.18x

10^{7}

h = 7

5%

9.317

20.598

9.759

3.929

49,526

5.09x

10^{9}

10%

8.672

10.552

8.672

3.444

49,526

5.09x

10^{9}

Equations113

\displaystyle\quad\max_{\pi}\mathbb{E}\Big{[}\sum_{\mathclap{t=0}}^{h-1}U(S_{t},\pi(S_{t}))\Big{]}

\displaystyle\quad\max_{\pi}\mathbb{E}\Big{[}\sum_{\mathclap{t=0}}^{h-1}U(S_{t},\pi(S_{t}))\Big{]}

\displaystyle\mathbb{E}\Big{[}\sum_{t=0}^{h-1}C^{j}(S_{t},\pi(S_{t}))\mid\pi\Big{]}\leq\ P^{j},\quad j\in\mathcal{N}.

\displaystyle\quad\max_{\pi}\mathbb{E}\Big{[}\sum_{\mathclap{t=0}}^{h-1}U(S_{t},\pi(S_{t}))\Big{]}

\displaystyle\quad\max_{\pi}\mathbb{E}\Big{[}\sum_{\mathclap{t=0}}^{h-1}U(S_{t},\pi(S_{t}))\Big{]}

\displaystyle\Pr\Big{(}\bigvee_{t=0}^{h}R^{j}(S_{t})\mid\pi\Big{)}\leq\Delta^{j},

\textsc{Er}^{j}(s_{k}):=\Pr\Big{(}\bigvee_{t=k}^{h}R^{j}(S_{t})\mid S_{k}=s_{k}\Big{)}.

\textsc{Er}^{j}(s_{k}):=\Pr\Big{(}\bigvee_{t=k}^{h}R^{j}(S_{t})\mid S_{k}=s_{k}\Big{)}.

\displaystyle\textsc{Er}^{j}(s_{k})=\sum_{s_{k+1}\in\mathcal{S}}\sum_{a\in\mathcal{A}}\Big{(}\textsc{Er}^{j}(s_{k+1})\pi(s_{k},a)\widetilde{T}^{j}(s_{k},a,s_{k+1})\Big{)}

\displaystyle\textsc{Er}^{j}(s_{k})=\sum_{s_{k+1}\in\mathcal{S}}\sum_{a\in\mathcal{A}}\Big{(}\textsc{Er}^{j}(s_{k+1})\pi(s_{k},a)\widetilde{T}^{j}(s_{k},a,s_{k+1})\Big{)}

+ r^{j} (s_{k}), j \in N

a \in A \sum x_{s, k, a}^{j} = s_{k - 1} \in S \sum a \in A \sum x_{s, k - 1, a}^{j} T^{j} (s_{k - 1}, a, s_{k}),

a \in A \sum x_{s, k, a}^{j} = s_{k - 1} \in S \sum a \in A \sum x_{s, k - 1, a}^{j} T^{j} (s_{k - 1}, a, s_{k}),

k = 1, ..., h - 1, s_{k} \in S,

a \in A \sum x_{s, 0, a}^{j} = 1.

\textsc E r^{j} (s_{0}) = k = 1 \sum h s_{k - 1} \in S \sum a \in A \sum s_{k} \in S \sum r^{j} (s_{k}) x_{s, k - 1, a}^{j} T^{j} (s_{k - 1}, a, s_{k})

\textsc E r^{j} (s_{0}) = k = 1 \sum h s_{k - 1} \in S \sum a \in A \sum s_{k} \in S \sum r^{j} (s_{k}) x_{s, k - 1, a}^{j} T^{j} (s_{k - 1}, a, s_{k})

+ r^{j} (s_{0})

\textsc (C C - S S P - I L P) x, z max k = 0 \sum h - 1 s_{k} \in S, a \in A \sum x_{s, k, a}^{0} U (s_{k}, a)

\textsc (C C - S S P - I L P) x, z max k = 0 \sum h - 1 s_{k} \in S, a \in A \sum x_{s, k, a}^{0} U (s_{k}, a)

a \in A \sum x_{s, k, a}^{j} = s_{k - 1} \in S \sum a \in A \sum x_{s, k - 1, a}^{j} T^{j} (s_{k - 1}, a, s_{k}),

k = 1, ..., h - 1, s_{k} \in S, j \in N \cup {0},

a \in A \sum x_{s, 0, a}^{j} = 1, j \in N \cup {0},

k = 0 \sum h - 1 s_{k} \in S \sum a \in A \sum s_{k + 1} \in S \sum r^{j} (s_{k + 1}) x_{s, k, a}^{j} T^{j} (s_{k}, a, s_{k + 1})

\leq Δ^{j} - r^{j} (s_{0}), j \in N,

a \in A \sum z_{s, k, a} \leq 1, k = 0, ..., h - 1, s_{k} \in S,

x_{s, k, a}^{j} \leq z_{s, k, a}, \forall j \in N \cup {0}, k = 0, ..., h - 1, s_{k} \in S,

z_{s, k, a} \in {0, 1}, x_{s, k, a}^{j} \in [0, 1],

\forall j \in N \cup {0}, k = 0, ..., h - 1, s_{k} \in S .

\displaystyle\quad\max_{\pi}\mathbb{E}\Big{[}\sum_{\mathclap{t=0}}^{h-1}U(S_{t},\pi(S_{t}))\Big{]}

\displaystyle\quad\max_{\pi}\mathbb{E}\Big{[}\sum_{\mathclap{t=0}}^{h-1}U(S_{t},\pi(S_{t}))\Big{]}

\displaystyle\Pr\Big{(}\sum_{t=0}^{h-1}C^{j}(S_{t},\pi(S_{t}))>P^{j}\mid\pi\Big{)}\leq\Delta^{j},

j \in N .

r^{j}(\langle s,g\rangle)\leftarrow\left\{\begin{array}[]{l r}1&\text{if }g^{j}>P^{j},\\ 0&\text{otherwise.}\end{array}\right.

r^{j}(\langle s,g\rangle)\leftarrow\left\{\begin{array}[]{l r}1&\text{if }g^{j}>P^{j},\\ 0&\text{otherwise.}\end{array}\right.

\left\{\begin{array}[]{l r}T(s,a,s^{\prime})&\text{if }\forall j,g^{\prime j}=g^{j}+C^{j}(s,a),\\ 0&\text{otherwise.}\end{array}\right.

\left\{\begin{array}[]{l r}T(s,a,s^{\prime})&\text{if }\forall j,g^{\prime j}=g^{j}+C^{j}(s,a),\\ 0&\text{otherwise.}\end{array}\right.

k = 0 \sum h - 1 C^{j} (S_{k}, π (S_{k})) > P^{j} ⟺ k = 0 ⋁ h - 1 (G^{j} (k) > P^{j})

k = 0 \sum h - 1 C^{j} (S_{k}, π (S_{k})) > P^{j} ⟺ k = 0 ⋁ h - 1 (G^{j} (k) > P^{j})

⟺ k = 0 ⋁ h - 1 g^{j} \in G^{j} : g^{j} > P^{j} ⋁ (G^{j} (k) = g^{j})

⟺ t = 0 ⋁ h s_{t}^{'} \in S^{'} ⋁ R^{j} (S_{t}^{'} = s_{t}^{'}) ⟺ t = 0 ⋁ h R^{j} (S_{t}^{'}) .

\Pr\Big{(}\sum_{t=0}^{h-1}C^{j}(S_{t},\pi(S_{t}))>(1+\epsilon)P^{j}\mid\pi\Big{)}\leq\Delta^{j}.

\Pr\Big{(}\sum_{t=0}^{h-1}C^{j}(S_{t},\pi(S_{t}))>(1+\epsilon)P^{j}\mid\pi\Big{)}\leq\Delta^{j}.

\Pr\Big{(}K^{j}\sum_{t=0}^{h-1}\hat{C}^{j}(S_{t},\pi(S_{t}))>(1+\epsilon)P^{j}\mid\pi\Big{)}\leq\Delta^{j}.

\Pr\Big{(}K^{j}\sum_{t=0}^{h-1}\hat{C}^{j}(S_{t},\pi(S_{t}))>(1+\epsilon)P^{j}\mid\pi\Big{)}\leq\Delta^{j}.

t = 0 \sum h - 1 C^{j} (s_{t}, \overset{π}{^} (s_{t})) \leq K^{j} t = 0 \sum h - 1 \hat{C}^{j} (s_{t}, \overset{π}{^} (s_{t})) \leq (1 + ϵ) P^{j} .

t = 0 \sum h - 1 C^{j} (s_{t}, \overset{π}{^} (s_{t})) \leq K^{j} t = 0 \sum h - 1 \hat{C}^{j} (s_{t}, \overset{π}{^} (s_{t})) \leq (1 + ϵ) P^{j} .

\displaystyle\Pr\Big{(}\sum_{t=0}^{h-1}C^{j}(S_{t},\hat{\pi}(S_{t}))>(1+\epsilon)P^{j}\Big{)}

\displaystyle\Pr\Big{(}\sum_{t=0}^{h-1}C^{j}(S_{t},\hat{\pi}(S_{t}))>(1+\epsilon)P^{j}\Big{)}

\displaystyle\hskip 14.22636pt\leq\Pr\Big{(}\sum_{t=0}^{h-1}\hat{C}^{j}(S_{t},\hat{\pi}(S_{t}))>(1+\epsilon)P^{j}\Big{)}\leq\Delta^{j}.

K t = 0 \sum h - 1 \hat{C}^{j} (s_{t}, π^{'} (s_{t})) \leq t = 0 \sum h - 1 C^{j} (s_{t}, π^{'} (s_{t})) + h K

K t = 0 \sum h - 1 \hat{C}^{j} (s_{t}, π^{'} (s_{t})) \leq t = 0 \sum h - 1 C^{j} (s_{t}, π^{'} (s_{t})) + h K

\leq P^{j} + h \frac{ϵ C _{m a x}^{j}}{h} \leq (1 + ϵ) P^{j},

\displaystyle\Pr\Big{(}\sum_{t=0}^{h-1}C^{j}(s_{t},\pi^{\prime}(s_{t}))\leq P^{j}\Big{)}

\displaystyle\Pr\Big{(}\sum_{t=0}^{h-1}C^{j}(s_{t},\pi^{\prime}(s_{t}))\leq P^{j}\Big{)}

\displaystyle\hskip 39.83385pt\leq\Pr\Big{(}K\sum_{t=0}^{h-1}\hat{C}^{j}(s_{t},\pi^{\prime}(s_{t}))\leq(1+\epsilon)P^{j}\Big{)}

\displaystyle\iff\Pr\Big{(}K\sum_{t=0}^{h-1}\hat{C}^{j}(s_{t},\pi^{\prime}(s_{t}))>(1+\epsilon)P^{j}\Big{)}

\displaystyle\hskip 42.67912pt\leq\Pr\Big{(}\sum_{t=0}^{h-1}C^{j}(s_{t},\pi^{\prime}(s_{t}))>P^{j}\Big{)}\leq\Delta^{j}.

\displaystyle\textsc{Er}^{j}(s_{k})=1-\Pr\Big{(}\bigwedge_{t=k}^{h}\neg R^{j}(S_{t})\mid S_{k}=s_{k}\Big{)}

\displaystyle\textsc{Er}^{j}(s_{k})=1-\Pr\Big{(}\bigwedge_{t=k}^{h}\neg R^{j}(S_{t})\mid S_{k}=s_{k}\Big{)}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Dual Formulation for Chance Constrained Stochastic Shortest Path with Application to Autonomous Vehicle Behavior Planning

Rashid Alyassi, Majid Khonji

Rashid Alyassi and Majid Khonji are with EECS Department, Khalifa University, Abu Dhabi, UAE (Emails: {rashid.alyassi, majid.khonji}@ku.ac.ae). This work was supported by the Khalifa University of Science and Technology under award references CIRA-2019-049, KKJRC-2019-Trans1 and KUCARS.

Abstract

Autonomous vehicles face the problem of optimizing the expected performance of subsequent maneuvers while bounding the risk of collision with surrounding dynamic obstacles. These obstacles, such as agent vehicles, often exhibit stochastic transitions that should be accounted for in a timely and safe manner. The Constrained Stochastic Shortest Path problem (C-SSP) is a formalism for planning in stochastic environments under certain types of operating constraints. While C-SSP allows specifying constraints in the planning problem, it does not allow for bounding the probability of constraint violation, which is desired in safety-critical applications. This work’s first contribution is an exact integer linear programming formulation for Chance-constrained SSP (CC-SSP) that attains deterministic policies. Second, a randomized rounding procedure is presented for stochastic policies. Third, we show that the CC-SSP formalism can be generalized to account for constraints that span through multiple time steps. Evaluation results show the usefulness of our approach in benchmark problems compared to existing approaches.

I Introduction

The Markov Decision Process (MDP) [12] is a widely used model for planning under uncertainty. An MDP consists of states, actions, a stochastic transition function, a utility function, and an initial state. A solution of MDP is a policy that maps each state to an action that maximizes the global expected utility. Stochastic Shortest Path (SSP) [2] is an MDP with non-negative utilities and a minimizing objective. The problem has interesting structures and can be formulated with a dual linear programming (LP) formulation [6] that can be interpreted as a minimum cost flow problem. Moreover, SSP has many heuristics-based algorithms [3, 10] that utilize admissible heuristics to guide the search without exploring the whole state space.

Besides, Constrained SSP (C-SSP)[1] provides the means to add mission-critical requirements while optimizing the objective function. Each requirement is formulated as a budget constraint imposed by a non-replenishable resource for which a bounded quantity is available during the entire plan execution. Resource consumption at each time step reduces the resource availability during subsequent time steps (see [5] for a detailed discussion). A stochastic policy of C-SSP is attainable using several efficient algorithms (e.g., [9]). A heuristics-based search approach in the dual LP can further improve the running time for large state spaces [19]. For deterministic policies, however, it is known that C-SSP is NP-Hard [8].

A special type of constraint occurs when we want to bound the probability of constraint violations by some threshold $\Delta$ , which is often called chance constraint. To simplify the problem, [7] proposes approximating the constraint using Markov’s inequality, which converts the problem to C-MDP. Another approach [4] applies Hoeffding’s inequality on the sum of independent random variables to improve the bound. Both methods provide conservative policies that respect safety thresholds but at the expense of the objective value.

In the partially observable setting, the problem is called Chance Constrained Partially Observable MDP (CC-POMDP). Several algorithms address CC-POMDP under local risk constraints, where risk is dependent only on the state, and the chance constraint is defined as the probability of failure in any state during execution [18, 16]. However, due to partial observability, these methods require an enumeration of histories, making the solution space exponentially large with respect to the planning horizon. To speed up the computation, [11] provides an anytime algorithm using a Lagrangian relaxation method for CC-SSP and CC-POMDP that returns feasible sub-optimal solutions and gradually improves the solution’s optimality when sufficient time is permitted. Unfortunately, the solution space is represented as an And-Or tree, similar to other CC-POMDP methods, causing the algorithm to be slow in the planning horizon.

Behavior planning for Autonomous Vehicles (AVs) has been extensively studied in deterministic environments (see e.g., [20]). One of the primary sources of uncertainty arises from drivers’ intentions, i.e., potential maneuvers of agent vehicles in the scene [13]. An effective planner should optimize the maneuvers (say, minimize total commute time) while bounding collision probability below some threshold. More recently, [14, 13] show that such a plan is achievable under more detailed settings. Essentially, they modeled the problem as a CC-POMDP, where actions resemble (deterministic) ego-vehicle maneuvers, and observations resemble predictions of agent vehicle maneuvers (see Fig. 1). In fact, the problem can be equivalently modeled as CC-SSP, with transitions used for joint actions and agent vehicle predictions. Unlike CC-POMDP, our CC-SSP formulation scales polynomially in the planning horizon resulting in a significant performance gain.

This paper’s contribution is threefold. First, an exact integer linear programming (ILP) formulation is provided for CC-SSP that obtains deterministic policies in the dual space. The chance-constrained formulation is based on the notion of execution risk introduced in the literature in the context of CC-POMDP [18] under local risk constraints. Second, we generalize upon local CC-SSP constraints to global constraints. Unlike [11], our formulation does not expand all possible histories, which entails exponential growth in solution space. Our formulation is also a departure from existing approximate approaches [7, 4] that have no bounded performance with respect to the objective value. Third, we present a randomized rounding algorithm for the CC-SSP that provides a close-to-optimal solution in practice within a reasonable time. Both stochastic and deterministic approaches are evaluated under two benchmark problems, including a highway behavior planning problem for autonomous vehicles (Fig. 1).

II CC-SSP with Local Risk Constraints

In this section, we present (A) the problem definition of the C-SSP and CC-SSP, (B) a method for computing the execution risk for the CC-SSP, (C) the ILP formulation for the CC-SSP, and (D) a Randomized Rounding algorithm for the CC-SSP.

II-A Problem Definition

We provide formal definitions for C-SSP and CC-SSP as follows. A fixed-horizon constrained stochastic shortest path (C-SSP) is a tuple $\langle\mathcal{S},\mathcal{A},T,U,s_{0},h,(C^{j},P^{j})_{j\in\mathcal{N}}\rangle$ ,

•

$\mathcal{S}$ and $\mathcal{A}$ are finite sets of discrete states and actions, respectively;

•

$T:\mathcal{S}\times\mathcal{A}\times\mathcal{S}\rightarrow[0,1]$ is a probabilistic transition function between states, $T(s,a,s^{\prime})=\Pr(s^{\prime}\mid a,s)$ , where $s,s^{\prime}\in\mathcal{S}$ and $a\in\mathcal{A}$ ;

•

$U:\mathcal{S}\times\mathcal{A}\rightarrow\mathbb{R}_{+}$ is a non-negative utility function;

•

$s_{0}$ is an initial state;

•

$h$ is the planning horizon;

•

$C^{j}:\mathcal{S}\times\mathcal{A}\rightarrow\mathbb{R}_{+}$ is a non-negative cost function, with respect to criteria $j\in\mathcal{N}$ , where $\mathcal{N}$ is the index set of all risk criteria;

•

$P^{j}\in\mathbb{R}_{+}$ is a positive upper bound on the cost criteria $j\in\mathcal{N}$ .

A deterministic policy $\pi(\cdot,\cdot)$ is a function that maps a state and time step into an action, $\pi:\mathcal{S}\times\{0,1,...,h-1\}\rightarrow\mathcal{A}$ . A stochastic policy $\pi:\mathcal{S}\times\{0,1,...,h-1\}\times\mathcal{A}\rightarrow[0,1]$ is defined as a distribution over actions from a given state and time. For simplicity, we write $\pi(S_{t})$ to denote $\pi(S_{t},t)$ , and $\pi(s_{k},a)$ to denote stochastic policy $\pi(s_{k},k,a)$ . A run is a sequence of random states $S_{0},S_{1},\ldots,S_{h-1},S_{h}$ that result from executing a policy, where $S_{0}=s_{0}$ is known.

The objective is to compute a policy that maximizes (resp. minimizes) the expected utility (resp. cost) while satisfying all the constraints. More formally,

[TABLE]

A fixed-horizon chance-constrained stochastic shortest path (CC-SSP) problem is formally defined as a tuple $M=\langle\mathcal{S},\mathcal{A},T,U,s_{0},h,(r^{j},\Delta^{j})_{j\in\mathcal{N}}\rangle,$ where $\mathcal{S},\mathcal{A},T,U,s_{0},h,\mathcal{N}$ are defined as in C-SSP,

•

$r^{j}:\mathcal{S}\rightarrow[0,1]$ is the probability of failure at a given state according to risk criterion $j$ ;

•

$\Delta^{j}$ is the corresponding risk budget, a threshold on the probability of failure over the planning horizon, for $j\in\mathcal{N}$ .

Let $R^{j}(s)$ be a Bernoulli random variable for failure at state $s$ with respect to criterion $j\in\mathcal{N}$ , such that $R^{j}(s)=1$ if and only if it is in a risky state, and zero otherwise. For simplicity, we write $R^{j}(s)$ to denote $R^{j}(s)=1$ . The objective of CC-SSP is to compute a policy (or a conditional plan) $\pi$ that maximizes the cumulative expected utility while bounding the probability of failure at any time step throughout the planning horizon.

[TABLE]

We refer to such constraints as local since failure could occur entirely in a single step (i.e., $R^{j}(S_{t})=1$ ). Sec. III considers failure as accumulative throughout an entire run. In both cases, the constraint bound the probability of failure.

The SSP problem and its constrained variants can be visualized by a Direct Acyclic Graph (DAG), where the vertices represent the states and their actions, and the edges are in two types. The edge between a state and an action represents the action’s cost, while the edge between an action and a state represents the transition probability. Thus, at depth $k$ all the states are reachable from previous actions taken at depth $k-1$ , and each depth has a maximum of $|\mathcal{S}|$ states (i.e, all the states). Fig. 2 provides a pictorial illustration of the CC-SSP And-Or search graph. Not that unlike And-Or search trees obtained by history enumeration algorithms (see, e.g., [11]), with such representation, a state may have multiple parents, leading to significant reduction in the search space.

II-B Execution Risk

Define the execution risk of a run at state $s_{k}$ as

[TABLE]

According to the definition, Cons. (3) is equivalent to $\textsc{Er}^{j}(s_{0})\leq\Delta^{j}$ . The lemma below shows that such constraint can be computed recursively.

Lemma 1.

The execution risk can be written as

[TABLE]

where $\widetilde{T}^{j}(s_{k},\pi(s_{k}),s_{k+1}):={T(s_{k},\pi(s_{k}),s_{k+1})}(1-r^{j}(s_{k}))$ .

A proof for Lemma 1 is deferred to the appendix.

II-C ILP Formulation

Define a variable $x_{s,k,a}^{j}\in[0,1]$ for each state $s_{k}$ at time $k$ , and action $a$ , and $j\in\mathcal{N}\cup\{0\}$ such that

[TABLE]

In fact, the above flow constraints for $j=0$ are the standard dual-space constraints for SSP [17]. According to the above constraints, we can rewrite the execution risk at $s_{0}$ , defined by Eqn. (4), according to the following key result.

Theorem 1.

Given a conditional plan $\mathbf{x}$ that satisfies Eqn. (5)-(6), the execution risk can written as linear function of $\mathbf{x}$ ,

[TABLE]

A proof for theorem 1 is provided in the appendix.

Let $\widetilde{T}^{0}(\cdot,\cdot,\cdot):=T(\cdot,\cdot,\cdot)$ . The chance-constrained stochastic shortest path problem (CC-SSP) can be formulated by the following ILP:

[TABLE]

Cons (11) follows directly from Theorem 1. The variable $z_{s,k,a}$ is used to bind the actions of all flows. Thus, for each constraint criterion $j$ , Cons. (13) ensures that for a given state, the same action is selected across all flows. Since, $z_{s,k,a}\in\{0,1\}$ , Cons. (12) ensures at most one deterministic action at each node.

II-D CC-SSP Randomized Rounding

The CC-SSP is an NP-hard problem [16], hence an exact solver such as the ILP discussed above can have, in the worst case, an exponential running time. Thus, we utilize a randomized rounding algorithm as a heuristic to quickly obtain (potentially) sub-optimal solutions for the CC-SSP. As we show in our experiments, the algorithm attains close-to-optimal solutions in practice in reasonable running time. The algorithm is a probabilistic algorithm that utilizes a relaxed LP solution of the problem and rounds it to get an integral solution. Note that a naive rounding procedure only satisfies the constraints at expectation, which is not the case with chance constraints. An LP formation is based on the ILP formation defined earlier (CC-SSP-ILP), with the $z$ variable redefined as a continuous variable $z_{s,k,a}\in[0,1]$ . The randomized rounding algorithm rounds each $z$ value per state with a probability proportional to its value. The rounding process is iterated until a feasible solution that satisfies all the constraints is acquired. The pseudocode is provided in Algorithm 1.

III CC-SSP with Global Risk Constraints

We consider global CC-SSP (GCC-SSP) defined as a tuple $M=\langle\mathcal{S},\mathcal{A},T,U,C,s_{0},h,\mathcal{N},P^{j},\Delta^{j}\rangle,$ where

•

$\mathcal{S},\mathcal{A},T,U,s_{0},h,\mathcal{N},\Delta^{j}$ are defined as in the CC-SSP;

•

$C^{j}:\mathcal{S}\times\mathcal{A}\rightarrow\mathbb{R}$ is a secondary cost function, for $j\in\mathcal{N}=\{1,...,n\}$ ;

•

$P^{j}$ is an upper bound on the cumulative cost, and $\Delta^{j}$ is the corresponding risk budget, a threshold on the probability that the cumulative cost function exceeds the upper bound over the planning horizon.

The objective of GCC-SSP is to compute a policy $\pi$ that satisfies,

[TABLE]

Note that the constraint in CC-SSP is local to each round, whereas in GCC-SSP, the constraint is dependant on the whole run, hence the name global.

III-A Reduction to CC-SSP

One way to solve a GCC-SSP instance $M$ is by reducing to a CC-SSP instance $M^{\prime}$ , defined by augmenting the state space to include all possible values of $g^{j}(k):=\sum_{t=0}^{k}C^{j}(s_{t},\pi(s_{t}))$ , for $k=0,...,h-1$ , and $s_{t}\in\mathcal{S}$ . In other words, $\mathcal{S}^{\prime}:=\mathcal{S}\times_{j}\mathcal{G}^{j}$ , where $\mathcal{G}^{j}$ is the set of all possible values of $g^{j}(k)$ . Let $g:=(g^{j})_{j\in\mathcal{N}}$ , $g^{j}\in\mathcal{G}^{j}$ . Unfortunately, the size of $\mathcal{G}^{j}$ is exponentially large, hence, impractical for many application domains. We will show in subsection III-B how to reduce the state space to a polynomial size. Given the augmented state space $\mathcal{S}^{\prime}$ , the risk probability of state $\langle s,g\rangle\in\mathcal{S}^{\prime}$ as

[TABLE]

The transition function in the augmented space is defined as $T^{\prime}(\langle s,g\rangle,a,\langle s^{\prime},g^{\prime}\rangle):=$

[TABLE]

Clearly, the transition function is a valid probability distribution. Write $G^{j}(k):=\sum_{t=0}^{k}C^{j}(S_{t},\pi(S_{t}))$ to denote the random variable for the cumulative cost up until state $S_{k}$ . The random variable takes values in $\mathcal{G}^{j}$ . Using the aforementioned reduction, we have

[TABLE]

Thus, the probability of these events are equivalent. Therefore, solving CC-SSP instance $M^{\prime}$ is equivalent to that of GCC-SSP $M$ . Hence, we can use CC-SSP-ILP to solve GCC-SSP following the aforementioned reduction.

Lemma 2.

GCC-SSP is reducible to CC-SSP.

III-B GCC-SSP via Resource Augmentation

In this subsection, we show how to discretize the sets $\mathcal{G}^{j}$ , denoted by $\hat{\mathcal{G}}^{j}$ , such that $|\hat{\mathcal{G}}^{j}|$ is polynomial in $h$ and $\frac{1}{\epsilon}$ , where $\epsilon\in(0,1)$ is an input parameter, while slightly affecting the solution feasibility, i.e.,

[TABLE]

Such model is often referred to as resource augmentation. In many application domains, a slight violation of resource capacity can be acceptable. If that’s not the case, one can increase $P^{j}$ to account for $\epsilon$ 111Arguably, increasing the bound $P^{j}$ changes the problem definition, where there may exist a better optimal for the original problem that is not attainable in the discretized version..

The idea is to discretized the cost function $C^{j}(\cdot,\cdot)$ , denoted by $\hat{C}^{j}(\cdot,\cdot)$ , as follows. Let $C^{j}_{\max}:=\max_{s\in\mathcal{S},a\in\mathcal{A}}C^{j}(s,a)$ , and $K^{j}:=\frac{\epsilon C^{j}_{\max}}{h}$ . The discretized values are given by $\hat{C}^{j}(s,a):=\lceil\tfrac{C^{j}(s,a)}{K^{j}}\rceil$ .

Note that $\hat{g}^{j}(h-1):=\sum_{t=0}^{h-1}\hat{C}^{j}(S_{t},\pi(S_{t}))\leq\sum_{t=0}^{h-1}\lceil\tfrac{C^{j}_{\max}}{K^{j}}\rceil=h\lceil\tfrac{h}{\epsilon}\rceil$ . Hence, the descritized version of $\mathcal{G}^{j}$ , denoted by $\hat{\mathcal{G}}^{j}$ , takes (at most) the values $\hat{\mathcal{G}}^{j}\subseteq\{0,1,....,h\lceil\tfrac{h}{\epsilon}\rceil\}$ , which is polynomial in $\frac{1}{\epsilon}$ and $h$ . Denote the discretized GCC-SSP by replacing Cons. (16) of GCC-SSP by

[TABLE]

Theorem 2.

An optimal solution of the discretized GCC-SSP is (super) optimal to GCC-SSP, but may violate Cons. (16) by a factor of at most $1+\epsilon$ , as shown by Cons. (19).

proof. We show that any feasible policy $\hat{\pi}$ for the discretized GCC-SSP satisfies Eqn. (19), and that any feasible policy $\pi^{\prime}$ of GCC-SSP satisfies Eqn. (20). This provides an argument that the optimal solution of GCC-SSP is attainable in the discretized GCC-SSP.

Without loss of generality, we assume $C^{j}_{\max}\leq P^{j}$ . The first direction follows from Eqn. (20) and the fact that we round up the cost function. Given a run $s_{0},s_{1},...,s_{h}$ that satisfies $\sum_{t=0}^{h-1}\hat{C}^{j}(s_{t},\hat{\pi}(s_{t}))\leq(1+\epsilon)P^{j}$ , we have

[TABLE]

Hence, $\Pr(\sum_{t=0}^{h-1}\hat{C}^{j}(S_{t},\hat{\pi}(S_{t}))\leq(1+\epsilon)P^{j})\leq\Pr(\sum_{t=0}^{h-1}C^{j}(S_{t},\hat{\pi}(S_{t}))\leq(1+\epsilon)P^{j})$ , which implies that

[TABLE]

Conversely, given a run $s_{0},s_{1},...,s_{h}$ that satisfies $\sum_{t=0}^{h-1}C^{j}(s_{t},\pi^{\prime}(s_{t}))\leq P^{j}$ , we have

[TABLE]

where we use the property $a\lceil\frac{b}{a}\rceil\leq b+a$ for $a,b\in\mathbb{R}_{+}$ . Therefore,

[TABLE]

Corollary 3.

GCC-SSP under resource augmentation (Cons. (19)) can be reduced in polynomial time to CC-SSP. Therefore, any efficient algorithm that solves CC-SSP can be invoked to solve GCC-SSP efficiently under resource augmentation.

The result follows directly from Lemma 2 and Theorem 2.

IV Experiment

To test the performance of the proposed CC-SSP model, we utilize two benchmark problems. The first problem is a two-dimensional grid problem where a robot can move in four directions. However, the movement is uncertain with an 80% success probability captured in the transition function. A ratio of 5% of the states are randomly selected as risky states, and 10% of the states are randomly assigned a cost of $1$ (for all actions), and the remaining states assigned a cost of $2$ . The grid size is $10000\times 10000$ (chosen to illustrate the CC-SSP’s performance in a large state problem), and the initial state is set to $(5000,5000)$ . The second problem is the highway problem (SSP version of the problem [15]), where an Autonomous Vehicle (AV) navigates in a three-lane highway with multiple dynamic Human-driven Vehicles (HVs). The HVs move based on a transition probability depicted in Fig. 4. We consider that an HV deviates from the center of a lane before executing a change lane maneuver. A risky state is defined when the ego-vehicle collides with any of the agent vehicles. The cost function for AVs actions (defined as maintain, speed up, slow down, left lane, right lane) is $(2,1,4,3,3)$ , respectively. Both problems have a cost-minimizing objective. Moreover, the initial state consists of the AV and six HVs (see Fig. 1). The experiments were conducted on an Intel i9 9900k processor using Gurobi 9 optimizer. Moreover, in Table I, the tree-based nodes represents the number of nodes in the And-Or tree history expansion similar to the approach used in [11, 14, 13].

IV-A Results

Table I demonstrates the CC-SSP solvers using the ILP formulation on the two benchmark problems. The tree-based nodes represents the number of nodes in the And-Or tree history expansion similar to the approach used in [11, 14, 13]. Both problems were solved for multiple numbers of horizons and risk thresholds ( $\Delta$ ). The higher risk threshold results in a better objective, which is expected considering that the risk constraint is less restrictive. Moreover, while the objective increases proportionally with the horizon proportional, the objective value to horizon ratio is decreasing, indicating a better average policy per step is found considering the longer planning horizon. Besides, the CC-SSP only explores a fraction of the total number of nodes, allowing for a longer planning horizon in a short running time. While the ILP produces an optimal solution, the randomized rounding algorithm returns a close-to-optimal solution in a fraction of the time. Fig. 3 plots the confidence interval of the randomized rounding algorithm’s approximation ratio (computed as the ratio between the algorithm’s objective value to that of an optimal solution) applied to the grid problem and repeated 100 times. The Randomized Rounding algorithm performs well with a minimum approximation ratio of 0.94 of the optimal solution.

V Conclusion

We presented a novel formulation for CC-SSP under local and global risk constraints. To optimize the running time, we provided an iterative rounding algorithm to obtain feasible policies efficiently. Experiments show that our approaches offer a significant improvement over existing techniques that rely on history tree expansion. In future work, we intend to find an approximation algorithm for the ILP that runs in polynomial time and provides a theoretical guarantee on the sub-optimality ratio.

VI Appendix

VI-A Proof of Lemma 1

Proof.

The execution risk can be written as,

[TABLE]

The probability term can be expanded, conditioned over subsequent states at time $k+1$ ,

[TABLE]

where Eqn. (26) follows from the independence between $\big{(}\bigwedge_{t=k+1}^{h}\neg R^{j}(S_{t})\mid S_{k+1}=s_{k+1}\big{)}$ and $(S_{k}=s_{k}\wedge\neg R^{j}(S_{k}))$ , by the Markov property, and between $(S_{k+1}=s_{k+1}\mid S_{k}=s_{k})$ and $\neg R^{j}(S_{k})$ . Combining Eqns. (25),(27) we obtain $\textsc{Er}^{j}(s_{k})$ as

[TABLE]

Note that the execution risk of $\textsc{Er}^{j}(s_{h})=r^{j}(s_{h})$ . For a stochastic policy $\pi$ , Eqn. (28) can be rewritten by enumerating through all actions, which completes the proof.

∎

VI-B Proof of Theorem 1

Proof.

We can rewrite the execution risk in Eqn. (4) as $\textsc{Er}^{\prime j}(s_{k})=$

[TABLE]

where $\textsc{Er}^{j}(s_{k})=\textsc{Er}^{\prime j}(s_{k})+r^{j}(s_{k})$ , and $\textsc{Er}^{j}(s_{0})=\textsc{Er}^{\prime j}(s_{0})$ . Based on the flow equations (5),(6), define a policy

[TABLE]

Note that the policy is a valid probability distribution. Thus, we rewrite Eq. (5),(6) by

[TABLE]

Next, we proof by induction the following statement

[TABLE]

Note that when $h^{\prime}=h$ , by Eqn. (VI-B), the last term of the above equation is zero, which is equivalent to the lemma’s claim. We consider the initial case with $h^{\prime}=1$ . From Eqn. (VI-B) and $\textsc{ER}^{j}(s_{0})=\textsc{ER}^{{}^{\prime}j}(s_{0})$ , we obtain

[TABLE]

where $s_{h^{\prime}-1}=s_{0}$ is a known state. For the inductive step, we assume Eqn. (31) holds up to $h^{\prime}=i$ , we proof the statement for $h^{\prime}=i+1$ . Expanding $\textsc{Er}^{{}^{\prime}j}(s_{i})$ using Eqn. (VI-B) obtains

[TABLE]

where Eqn. (32) follows by substituting $\pi(s_{i},a^{\prime})$ , using Eqn. (30), by

[TABLE]

Rewriting Eqn. (32) obtains

[TABLE]

∎

Bibliography20

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E. Altman. Constrained Markov decision processes , volume 7. CRC Press, 1999.
2[2] D. P. Bertsekas and J. N. Tsitsiklis. An analysis of stochastic shortest path problems. Mathematics of Operations Research , 16(3):580–595, 1991.
3[3] B. Bonet and H. Geffner. Labeled rtdp: Improving the convergence of real-time dynamic programming. In ICAPS , volume 3, pages 12–21, 2003.
4[4] F. De Nijs, E. Walraven, M. de Weerdt, and M. Spaan. Bounding the probability of resource constraint violations in multi-agent mdps. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 31, 2017.
5[5] F. de Nijs, E. Walraven, M. De Weerdt, and M. Spaan. Constrained multiagent markov decision processes: a taxonomy of problems and algorithms. Journal of Artificial Intelligence Research , 70:955–1001, 2021.
6[6] F. d’Epenoux. A probabilistic production and inventory problem. Management Science , 10(1):98–108, 1963.
7[7] D. A. Dolgov and E. H. Durfee. Approximating optimal policies for agents with limited execution resources. In IJCAI , pages 1107–1112, 2003.
8[8] E. A. Feinberg. Constrained discounted markov decision processes and hamiltonian cycles. Mathematics of Operations Research , 25(1):130–140, 2000.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Dual Formulation for Chance Constrained Stochastic Shortest Path with Application to Autonomous Vehicle Behavior Planning

Abstract

I Introduction

II CC-SSP with Local Risk Constraints

II-A Problem Definition

II-B Execution Risk

Lemma 1**.**

II-C ILP Formulation

Theorem 1**.**

II-D CC-SSP Randomized Rounding

III CC-SSP with Global Risk Constraints

III-A Reduction to CC-SSP

Lemma 2**.**

III-B *GCC-SSP via Resource Augmentation *

Theorem 2**.**

Corollary 3**.**

IV Experiment

IV-A Results

V Conclusion

VI Appendix

VI-A Proof of Lemma 1

Proof.

VI-B *Proof of Theorem 1 *

Proof.

Lemma 1.

Theorem 1.

Lemma 2.

III-B GCC-SSP via Resource Augmentation

Theorem 2.

Corollary 3.

VI-B Proof of Theorem 1