Dual Formulation for Chance Constrained Stochastic Shortest Path with Application to Autonomous Vehicle Behavior Planning
Rashid Alyassi, Majid Khonji

TL;DR
This paper introduces a new exact integer linear programming approach for chance-constrained stochastic shortest path problems, enabling safe and efficient autonomous vehicle behavior planning under uncertainty.
Contribution
It presents a novel formulation for chance-constrained SSP, a randomized policy method, and extends the formalism to multi-step constraints, advancing safety-critical planning.
Findings
The approach outperforms existing methods on benchmark problems.
It provides deterministic policies for stochastic environments.
The method effectively bounds collision risk in autonomous navigation.
Abstract
Autonomous vehicles face the problem of optimizing the expected performance of subsequent maneuvers while bounding the risk of collision with surrounding dynamic obstacles. These obstacles, such as agent vehicles, often exhibit stochastic transitions that should be accounted for in a timely and safe manner. The Constrained Stochastic Shortest Path problem (C-SSP) is a formalism for planning in stochastic environments under certain types of operating constraints. While C-SSP allows specifying constraints in the planning problem, it does not allow for bounding the probability of constraint violation, which is desired in safety-critical applications. This work's first contribution is an exact integer linear programming formulation for Chance-constrained SSP (CC-SSP) that attains deterministic policies. Second, a randomized rounding procedure is presented for stochastic policies. Third, we…
| Horizon |
|
|
|
|
|
|
||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Grid Problem (10000x10000) | 5% | 13.603 | 0.289 | 13.825 | 0.0637 | 506 | 5.36x | |||||||||||||
| 10% | 12.958 | 0.2448 | 13.824 | 0.064 | 506 | 5.36x | ||||||||||||||
| 5% | 30.975 | 8.983 | 31.743 | 4.492 | 6,201 | 8.67x | ||||||||||||||
| 10% | 29.414 | 11.315 | 31.743 | 4.257 | 6,201 | 8.67x | ||||||||||||||
| 5% | 36.767 | 50.790 | 37.716 | 12.758 | 10,416 | 2.16x | ||||||||||||||
| 10% | 34.902 | 43.099 | 37.716 | 13.299 | 10,416 | 2.16x | ||||||||||||||
| 5% | 42.555 | 126.046 | 43.687 | 29.072 | 16,206 | 5.37x | ||||||||||||||
| 10% | 40.386 | 101.244 | 43.687 | 41.808 | 16,206 | 5.37x | ||||||||||||||
| Highway Problem (Three lanes) | 5% | 6.258 | 0.165 | 6.448 | 0.098 | 2,438 | 124,251 | |||||||||||||
| 10% | 5.854 | 0.148 | 5.854 | 0.090 | 2,438 | 124,251 | ||||||||||||||
| 5% | 7.343 | 1.274 | 7.642 | 0.356 | 7,666 | 1.99x | ||||||||||||||
| 10% | 6.833 | 0.655 | 6.833 | 0.337 | 7,666 | 1.99x | ||||||||||||||
| 5% | 8.343 | 4.249 | 8.712 | 1.235 | 20,576 | 3.18x | ||||||||||||||
| 10% | 7.761 | 2.721 | 7.762 | 1.247 | 20,576 | 3.18x | ||||||||||||||
| 5% | 9.317 | 20.598 | 9.759 | 3.929 | 49,526 | 5.09x | ||||||||||||||
| 10% | 8.672 | 10.552 | 8.672 | 3.444 | 49,526 | 5.09x |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Dual Formulation for Chance Constrained Stochastic Shortest Path with Application to Autonomous Vehicle Behavior Planning
Rashid Alyassi, Majid Khonji
Rashid Alyassi and Majid Khonji are with EECS Department, Khalifa University, Abu Dhabi, UAE (Emails: {rashid.alyassi, majid.khonji}@ku.ac.ae). This work was supported by the Khalifa University of Science and Technology under award references CIRA-2019-049, KKJRC-2019-Trans1 and KUCARS.
Abstract
Autonomous vehicles face the problem of optimizing the expected performance of subsequent maneuvers while bounding the risk of collision with surrounding dynamic obstacles. These obstacles, such as agent vehicles, often exhibit stochastic transitions that should be accounted for in a timely and safe manner. The Constrained Stochastic Shortest Path problem (C-SSP) is a formalism for planning in stochastic environments under certain types of operating constraints. While C-SSP allows specifying constraints in the planning problem, it does not allow for bounding the probability of constraint violation, which is desired in safety-critical applications. This work’s first contribution is an exact integer linear programming formulation for Chance-constrained SSP (CC-SSP) that attains deterministic policies. Second, a randomized rounding procedure is presented for stochastic policies. Third, we show that the CC-SSP formalism can be generalized to account for constraints that span through multiple time steps. Evaluation results show the usefulness of our approach in benchmark problems compared to existing approaches.
I Introduction
The Markov Decision Process (MDP) [12] is a widely used model for planning under uncertainty. An MDP consists of states, actions, a stochastic transition function, a utility function, and an initial state. A solution of MDP is a policy that maps each state to an action that maximizes the global expected utility. Stochastic Shortest Path (SSP) [2] is an MDP with non-negative utilities and a minimizing objective. The problem has interesting structures and can be formulated with a dual linear programming (LP) formulation [6] that can be interpreted as a minimum cost flow problem. Moreover, SSP has many heuristics-based algorithms [3, 10] that utilize admissible heuristics to guide the search without exploring the whole state space.
Besides, Constrained SSP (C-SSP)[1] provides the means to add mission-critical requirements while optimizing the objective function. Each requirement is formulated as a budget constraint imposed by a non-replenishable resource for which a bounded quantity is available during the entire plan execution. Resource consumption at each time step reduces the resource availability during subsequent time steps (see [5] for a detailed discussion). A stochastic policy of C-SSP is attainable using several efficient algorithms (e.g., [9]). A heuristics-based search approach in the dual LP can further improve the running time for large state spaces [19]. For deterministic policies, however, it is known that C-SSP is NP-Hard [8].
A special type of constraint occurs when we want to bound the probability of constraint violations by some threshold , which is often called chance constraint. To simplify the problem, [7] proposes approximating the constraint using Markov’s inequality, which converts the problem to C-MDP. Another approach [4] applies Hoeffding’s inequality on the sum of independent random variables to improve the bound. Both methods provide conservative policies that respect safety thresholds but at the expense of the objective value.
In the partially observable setting, the problem is called Chance Constrained Partially Observable MDP (CC-POMDP). Several algorithms address CC-POMDP under local risk constraints, where risk is dependent only on the state, and the chance constraint is defined as the probability of failure in any state during execution [18, 16]. However, due to partial observability, these methods require an enumeration of histories, making the solution space exponentially large with respect to the planning horizon. To speed up the computation, [11] provides an anytime algorithm using a Lagrangian relaxation method for CC-SSP and CC-POMDP that returns feasible sub-optimal solutions and gradually improves the solution’s optimality when sufficient time is permitted. Unfortunately, the solution space is represented as an And-Or tree, similar to other CC-POMDP methods, causing the algorithm to be slow in the planning horizon.
Behavior planning for Autonomous Vehicles (AVs) has been extensively studied in deterministic environments (see e.g., [20]). One of the primary sources of uncertainty arises from drivers’ intentions, i.e., potential maneuvers of agent vehicles in the scene [13]. An effective planner should optimize the maneuvers (say, minimize total commute time) while bounding collision probability below some threshold. More recently, [14, 13] show that such a plan is achievable under more detailed settings. Essentially, they modeled the problem as a CC-POMDP, where actions resemble (deterministic) ego-vehicle maneuvers, and observations resemble predictions of agent vehicle maneuvers (see Fig. 1). In fact, the problem can be equivalently modeled as CC-SSP, with transitions used for joint actions and agent vehicle predictions. Unlike CC-POMDP, our CC-SSP formulation scales polynomially in the planning horizon resulting in a significant performance gain.
This paper’s contribution is threefold. First, an exact integer linear programming (ILP) formulation is provided for CC-SSP that obtains deterministic policies in the dual space. The chance-constrained formulation is based on the notion of execution risk introduced in the literature in the context of CC-POMDP [18] under local risk constraints. Second, we generalize upon local CC-SSP constraints to global constraints. Unlike [11], our formulation does not expand all possible histories, which entails exponential growth in solution space. Our formulation is also a departure from existing approximate approaches [7, 4] that have no bounded performance with respect to the objective value. Third, we present a randomized rounding algorithm for the CC-SSP that provides a close-to-optimal solution in practice within a reasonable time. Both stochastic and deterministic approaches are evaluated under two benchmark problems, including a highway behavior planning problem for autonomous vehicles (Fig. 1).
II CC-SSP with Local Risk Constraints
In this section, we present (A) the problem definition of the C-SSP and CC-SSP, (B) a method for computing the execution risk for the CC-SSP, (C) the ILP formulation for the CC-SSP, and (D) a Randomized Rounding algorithm for the CC-SSP.
II-A Problem Definition
We provide formal definitions for C-SSP and CC-SSP as follows. A fixed-horizon constrained stochastic shortest path (C-SSP) is a tuple ,
- •
and are finite sets of discrete states and actions, respectively;
- •
is a probabilistic transition function between states, , where and ;
- •
is a non-negative utility function;
- •
is an initial state;
- •
is the planning horizon;
- •
is a non-negative cost function, with respect to criteria , where is the index set of all risk criteria;
- •
is a positive upper bound on the cost criteria .
A deterministic policy is a function that maps a state and time step into an action, . A stochastic policy is defined as a distribution over actions from a given state and time. For simplicity, we write to denote , and to denote stochastic policy . A run is a sequence of random states that result from executing a policy, where is known.
The objective is to compute a policy that maximizes (resp. minimizes) the expected utility (resp. cost) while satisfying all the constraints. More formally,
[TABLE]
A fixed-horizon chance-constrained stochastic shortest path (CC-SSP) problem is formally defined as a tuple where are defined as in C-SSP,
- •
is the probability of failure at a given state according to risk criterion ;
- •
is the corresponding risk budget, a threshold on the probability of failure over the planning horizon, for .
Let be a Bernoulli random variable for failure at state with respect to criterion , such that if and only if it is in a risky state, and zero otherwise. For simplicity, we write to denote . The objective of CC-SSP is to compute a policy (or a conditional plan) that maximizes the cumulative expected utility while bounding the probability of failure at any time step throughout the planning horizon.
[TABLE]
We refer to such constraints as local since failure could occur entirely in a single step (i.e., ). Sec. III considers failure as accumulative throughout an entire run. In both cases, the constraint bound the probability of failure.
The SSP problem and its constrained variants can be visualized by a Direct Acyclic Graph (DAG), where the vertices represent the states and their actions, and the edges are in two types. The edge between a state and an action represents the action’s cost, while the edge between an action and a state represents the transition probability. Thus, at depth all the states are reachable from previous actions taken at depth , and each depth has a maximum of states (i.e, all the states). Fig. 2 provides a pictorial illustration of the CC-SSP And-Or search graph. Not that unlike And-Or search trees obtained by history enumeration algorithms (see, e.g., [11]), with such representation, a state may have multiple parents, leading to significant reduction in the search space.
II-B Execution Risk
Define the execution risk of a run at state as
[TABLE]
According to the definition, Cons. (3) is equivalent to . The lemma below shows that such constraint can be computed recursively.
Lemma 1**.**
The execution risk can be written as
[TABLE]
where .
A proof for Lemma 1 is deferred to the appendix.
II-C ILP Formulation
Define a variable for each state at time , and action , and such that
[TABLE]
In fact, the above flow constraints for are the standard dual-space constraints for SSP [17]. According to the above constraints, we can rewrite the execution risk at , defined by Eqn. (4), according to the following key result.
Theorem 1**.**
Given a conditional plan that satisfies Eqn. (5)-(6), the execution risk can written as linear function of ,
[TABLE]
A proof for theorem 1 is provided in the appendix.
Let . The chance-constrained stochastic shortest path problem (CC-SSP) can be formulated by the following ILP:
[TABLE]
Cons (11) follows directly from Theorem 1. The variable is used to bind the actions of all flows. Thus, for each constraint criterion , Cons. (13) ensures that for a given state, the same action is selected across all flows. Since, , Cons. (12) ensures at most one deterministic action at each node.
II-D CC-SSP Randomized Rounding
The CC-SSP is an NP-hard problem [16], hence an exact solver such as the ILP discussed above can have, in the worst case, an exponential running time. Thus, we utilize a randomized rounding algorithm as a heuristic to quickly obtain (potentially) sub-optimal solutions for the CC-SSP. As we show in our experiments, the algorithm attains close-to-optimal solutions in practice in reasonable running time. The algorithm is a probabilistic algorithm that utilizes a relaxed LP solution of the problem and rounds it to get an integral solution. Note that a naive rounding procedure only satisfies the constraints at expectation, which is not the case with chance constraints. An LP formation is based on the ILP formation defined earlier (CC-SSP-ILP), with the variable redefined as a continuous variable . The randomized rounding algorithm rounds each value per state with a probability proportional to its value. The rounding process is iterated until a feasible solution that satisfies all the constraints is acquired. The pseudocode is provided in Algorithm 1.
III CC-SSP with Global Risk Constraints
We consider global CC-SSP (GCC-SSP) defined as a tuple where
- •
are defined as in the CC-SSP;
- •
is a secondary cost function, for ;
- •
is an upper bound on the cumulative cost, and is the corresponding risk budget, a threshold on the probability that the cumulative cost function exceeds the upper bound over the planning horizon.
The objective of GCC-SSP is to compute a policy that satisfies,
[TABLE]
Note that the constraint in CC-SSP is local to each round, whereas in GCC-SSP, the constraint is dependant on the whole run, hence the name global.
III-A Reduction to CC-SSP
One way to solve a GCC-SSP instance is by reducing to a CC-SSP instance , defined by augmenting the state space to include all possible values of , for , and . In other words, , where is the set of all possible values of . Let , . Unfortunately, the size of is exponentially large, hence, impractical for many application domains. We will show in subsection III-B how to reduce the state space to a polynomial size. Given the augmented state space , the risk probability of state as
[TABLE]
The transition function in the augmented space is defined as
[TABLE]
Clearly, the transition function is a valid probability distribution. Write to denote the random variable for the cumulative cost up until state . The random variable takes values in . Using the aforementioned reduction, we have
[TABLE]
Thus, the probability of these events are equivalent. Therefore, solving CC-SSP instance is equivalent to that of GCC-SSP . Hence, we can use CC-SSP-ILP to solve GCC-SSP following the aforementioned reduction.
Lemma 2**.**
GCC-SSP is reducible to CC-SSP.
III-B *GCC-SSP via Resource Augmentation *
In this subsection, we show how to discretize the sets , denoted by , such that is polynomial in and , where is an input parameter, while slightly affecting the solution feasibility, i.e.,
[TABLE]
Such model is often referred to as resource augmentation. In many application domains, a slight violation of resource capacity can be acceptable. If that’s not the case, one can increase to account for 111Arguably, increasing the bound changes the problem definition, where there may exist a better optimal for the original problem that is not attainable in the discretized version..
The idea is to discretized the cost function , denoted by , as follows. Let , and . The discretized values are given by .
Note that . Hence, the descritized version of , denoted by , takes (at most) the values , which is polynomial in and . Denote the discretized GCC-SSP by replacing Cons. (16) of GCC-SSP by
[TABLE]
Theorem 2**.**
An optimal solution of the discretized GCC-SSP is (super) optimal to GCC-SSP, but may violate Cons. (16) by a factor of at most , as shown by Cons. (19).
proof. We show that any feasible policy for the discretized GCC-SSP satisfies Eqn. (19), and that any feasible policy of GCC-SSP satisfies Eqn. (20). This provides an argument that the optimal solution of GCC-SSP is attainable in the discretized GCC-SSP.
Without loss of generality, we assume . The first direction follows from Eqn. (20) and the fact that we round up the cost function. Given a run that satisfies , we have
[TABLE]
Hence, , which implies that
[TABLE]
Conversely, given a run that satisfies , we have
[TABLE]
where we use the property for . Therefore,
[TABLE]
Corollary 3**.**
GCC-SSP under resource augmentation (Cons. (19)) can be reduced in polynomial time to CC-SSP. Therefore, any efficient algorithm that solves CC-SSP can be invoked to solve GCC-SSP efficiently under resource augmentation.
The result follows directly from Lemma 2 and Theorem 2.
IV Experiment
To test the performance of the proposed CC-SSP model, we utilize two benchmark problems. The first problem is a two-dimensional grid problem where a robot can move in four directions. However, the movement is uncertain with an 80% success probability captured in the transition function. A ratio of 5% of the states are randomly selected as risky states, and 10% of the states are randomly assigned a cost of (for all actions), and the remaining states assigned a cost of . The grid size is (chosen to illustrate the CC-SSP’s performance in a large state problem), and the initial state is set to . The second problem is the highway problem (SSP version of the problem [15]), where an Autonomous Vehicle (AV) navigates in a three-lane highway with multiple dynamic Human-driven Vehicles (HVs). The HVs move based on a transition probability depicted in Fig. 4. We consider that an HV deviates from the center of a lane before executing a change lane maneuver. A risky state is defined when the ego-vehicle collides with any of the agent vehicles. The cost function for AVs actions (defined as maintain, speed up, slow down, left lane, right lane) is , respectively. Both problems have a cost-minimizing objective. Moreover, the initial state consists of the AV and six HVs (see Fig. 1). The experiments were conducted on an Intel i9 9900k processor using Gurobi 9 optimizer. Moreover, in Table I, the tree-based nodes represents the number of nodes in the And-Or tree history expansion similar to the approach used in [11, 14, 13].
IV-A Results
Table I demonstrates the CC-SSP solvers using the ILP formulation on the two benchmark problems. The tree-based nodes represents the number of nodes in the And-Or tree history expansion similar to the approach used in [11, 14, 13]. Both problems were solved for multiple numbers of horizons and risk thresholds (). The higher risk threshold results in a better objective, which is expected considering that the risk constraint is less restrictive. Moreover, while the objective increases proportionally with the horizon proportional, the objective value to horizon ratio is decreasing, indicating a better average policy per step is found considering the longer planning horizon. Besides, the CC-SSP only explores a fraction of the total number of nodes, allowing for a longer planning horizon in a short running time. While the ILP produces an optimal solution, the randomized rounding algorithm returns a close-to-optimal solution in a fraction of the time. Fig. 3 plots the confidence interval of the randomized rounding algorithm’s approximation ratio (computed as the ratio between the algorithm’s objective value to that of an optimal solution) applied to the grid problem and repeated 100 times. The Randomized Rounding algorithm performs well with a minimum approximation ratio of 0.94 of the optimal solution.
V Conclusion
We presented a novel formulation for CC-SSP under local and global risk constraints. To optimize the running time, we provided an iterative rounding algorithm to obtain feasible policies efficiently. Experiments show that our approaches offer a significant improvement over existing techniques that rely on history tree expansion. In future work, we intend to find an approximation algorithm for the ILP that runs in polynomial time and provides a theoretical guarantee on the sub-optimality ratio.
VI Appendix
VI-A Proof of Lemma 1
Proof.
The execution risk can be written as,
[TABLE]
The probability term can be expanded, conditioned over subsequent states at time ,
[TABLE]
where Eqn. (26) follows from the independence between \big{(}\bigwedge_{t=k+1}^{h}\neg R^{j}(S_{t})\mid S_{k+1}=s_{k+1}\big{)} and , by the Markov property, and between and . Combining Eqns. (25),(27) we obtain as
[TABLE]
Note that the execution risk of . For a stochastic policy , Eqn. (28) can be rewritten by enumerating through all actions, which completes the proof.
∎
VI-B *Proof of Theorem 1 *
Proof.
We can rewrite the execution risk in Eqn. (4) as
[TABLE]
where , and . Based on the flow equations (5),(6), define a policy
[TABLE]
Note that the policy is a valid probability distribution. Thus, we rewrite Eq. (5),(6) by
[TABLE]
Next, we proof by induction the following statement
[TABLE]
Note that when , by Eqn. (VI-B), the last term of the above equation is zero, which is equivalent to the lemma’s claim. We consider the initial case with . From Eqn. (VI-B) and , we obtain
[TABLE]
where is a known state. For the inductive step, we assume Eqn. (31) holds up to , we proof the statement for . Expanding using Eqn. (VI-B) obtains
[TABLE]
[TABLE]
[TABLE]
where Eqn. (32) follows by substituting , using Eqn. (30), by
[TABLE]
Rewriting Eqn. (32) obtains
[TABLE]
∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E. Altman. Constrained Markov decision processes , volume 7. CRC Press, 1999.
- 2[2] D. P. Bertsekas and J. N. Tsitsiklis. An analysis of stochastic shortest path problems. Mathematics of Operations Research , 16(3):580–595, 1991.
- 3[3] B. Bonet and H. Geffner. Labeled rtdp: Improving the convergence of real-time dynamic programming. In ICAPS , volume 3, pages 12–21, 2003.
- 4[4] F. De Nijs, E. Walraven, M. de Weerdt, and M. Spaan. Bounding the probability of resource constraint violations in multi-agent mdps. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 31, 2017.
- 5[5] F. de Nijs, E. Walraven, M. De Weerdt, and M. Spaan. Constrained multiagent markov decision processes: a taxonomy of problems and algorithms. Journal of Artificial Intelligence Research , 70:955–1001, 2021.
- 6[6] F. d’Epenoux. A probabilistic production and inventory problem. Management Science , 10(1):98–108, 1963.
- 7[7] D. A. Dolgov and E. H. Durfee. Approximating optimal policies for agents with limited execution resources. In IJCAI , pages 1107–1112, 2003.
- 8[8] E. A. Feinberg. Constrained discounted markov decision processes and hamiltonian cycles. Mathematics of Operations Research , 25(1):130–140, 2000.
