Stochastic Lipschitz Dynamic Programming
Shabbir Ahmed, Filipe Goulart Cabral, Bernardo Freitas Paulo da Costa

TL;DR
This paper introduces a novel algorithm for multistage stochastic MILPs that uses Lipschitz cuts to improve lower bounds, demonstrated through case studies comparing with existing methods like SDDP and SDDiP.
Contribution
It develops a new Lipschitz cut-based approach for stochastic MILPs that maintains problem class integrity and enhances solution quality.
Findings
The proposed algorithm effectively approximates non-convex cost functions.
Application to case studies shows competitive performance with existing methods.
Lipschitz cuts derived from Augmented Lagrangian Duality are MILP representable.
Abstract
We propose a new algorithm for solving multistage stochastic mixed integer linear programming (MILP) problems with complete continuous recourse. In a similar way to cutting plane methods, we construct nonlinear Lipschitz cuts to build lower approximations for the non-convex cost to go functions. An example of such a class of cuts are those derived using Augmented Lagrangian Duality for MILPs. The family of Lipschitz cuts we use is MILP representable, so that the introduction of these cuts does not change the class of the original stochastic optimization problem. We illustrate the application of this algorithm on two simple case studies, comparing our approach with the convex relaxation of the problems, for which we can apply SDDP, and for a discretized approximation, applying SDDiP.
| SB | SLDP tents | SLDP ALD | SDDiP 0.1 | SDDiP 0.01 | |
| LB | 1.167 | 3.073 | 3.085 | 3.420 | 2.370 |
| UB | 3.453 | 3.320 | 3.313 | 3.823 | 3.490 |
| time (s) | 12 | 558 | 605 | 1994 | 3317 |
| N | 2 | 3 | 6 |
|---|---|---|---|
| Objective | -57.000 | -59.333 | -61.222 |
| SB LB | -58.096 | -61.961 | -65.468 |
| ALD LB | -57.000 | -59.333 | -61.274 |
| remaining (%) | 0 | 0 | 1.27 |
| SB time (s) | 0.98 | 1.60 | 005.57 |
| ALD time (s) | 19.4 | 77.8 | 125.0 |
| ALD/SB time | 19.8 | 48.6 | 22.4 |
| N | 2 | 3 | 6 |
|---|---|---|---|
| Objective | -57.000 | -59.333 | -61.222 |
| SB LB | -58.095 | -61.961 | -65.541 |
| ALD LB | -57.000 | -59.495 | -61.579 |
| remaining (%) | 0 | 6.27 | 8.31 |
| SB time (s) | 000.63 | 001.30 | 005.22 |
| ALD time (s) | 260 | 532 | 758 |
| ALD/SB time | 413 | 409 | 145 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Stochastic Lipschitz Dynamic Programming
Shabbir Ahmed
Filipe Goulart Cabral
Bernardo Freitas Paulo da Costa
Abstract
We propose a new algorithm for solving multistage stochastic mixed integer linear programming (MILP) problems with complete continuous recourse. In a similar way to cutting plane methods, we construct nonlinear Lipschitz cuts to build lower approximations for the non-convex cost-to-go functions. An example of such a class of cuts are those derived using Augmented Lagrangian Duality for MILPs. The family of Lipschitz cuts we use is MILP representable, so that the introduction of these cuts does not change the class of the original stochastic optimization problem.
We illustrate the application of this algorithm on two simple case studies, comparing our approach with the convex relaxation of the problems, for which we can apply SDDP, and for a discretized approximation, applying SDDiP.
1 Introduction
Non-convex stochastic programming problems arise naturally in models that consider binary or integer variables, since such variables allow the representation of a wide variety of constraints at the cost of inducing non-convex feasible sets. Recent advances in commercial solvers have made possible and more robust the solution of several mixed integer linear programming problems, broadening the interest of the academic community in studying and developing algorithms for the mixed integer stochastic programming area. Applications such as unit commitment [Tahanan et al., 2015], [Costley et al., 2017], [Knueven et al., 2018], [Ackooij et al., 2018], optimal investment decisions [Singh et al., 2009], [Conejo et al., 2016], and power system operational planning [Cerisola et al., 2012], [Thome et al., 2013], [Hjelmeland et al., 2019] have driven the development of new algorithms for problems that do not fit into the convex optimization framework.
Before the development of the MIDAS [Philpott et al., 2016] and the SDDiP [Zou et al., 2018] algorithms, the solution of large-scale multistage stochastic programming problems with theoretical guarantees was restricted to convex problems, using algorithms such as Nested Cutting Plane [Glassey, 1973], Progressive Hedging [Rockafellar and Wets, 1991] and SDDP [Pereira and Pinto, 1991]. For a recent reference, see [Birge and Louveaux, 2011] and [Shapiro et al., 2014]. Most of those algorithms build convex underapproximations of the cost-to-go function at each node of the scenario tree, or at each stage in the stagewise independent setting, to solve the stochastic convex program. Even the SDDiP algorithm relies on convex underapproximations, which are shown to be sufficient for convergence thanks to the binary discretization of the state variables and the tightness property of the Lagrangian cuts at binary states. The MIDAS algorithms is based on a different idea, using step functions to approximate monotone non-convex cost-to-go functions.
The aim of this paper is to propose a new algorithm for mixed integer multistage stochastic programming problems, which does not discretize the state variable and uses non-linear cuts to capture non-convexities in the future cost function. Inspired by the exactness results from [Feizollahi et al., 2017], we build non-convex underapproximations of the expected cost-to-go function, whose basic pieces are augmented Lagrangian cuts. They can be calculated from augmented Lagrangian duality, which [Feizollahi et al., 2017] have shown to be exact for mixed-integer linear problems when the augmentation function is, for example, the -norm. If the original problem already had pure binary state variables, then, as it was shown in [Zou et al., 2018], Lagrangian cuts are already tight, and we can use them in our algorithm by setting the non-linear term of the augmented Lagrangian to zero.
As we will see, even a countable number of these cuts cannot describe exactly the value function of a mixed-integer linear program, even when that function is continuous. However, the cuts have sufficient structure for our purposes, in two very important ways: first, they are Lipschitz functions, which still yield global estimates from local behaviour, even if those are weaker than what’s typical with convexity. The Lipschitz estimates will be crucial for our convergence arguments, and the resulting algorithm is therefore called Stochastic Lipschitz Dynamic Programming (SLDP).
Second, it is possible to represent such cuts using binary variables and a system of linear equalities and inequalities. Therefore, we do not leave the class of mixed-integer linear problems when incorporating -augmented Lagrangian cuts in each node/stage of the stochastic problem. Under some hypothesis for continuous recourse of each stage, it is possible to prove that the expected cost-to-go functions of stochastic mixed-integer linear problems are Lipschitz, and therefore our algorithm can be applied directly.
We have organized this paper as follows: the next section motivates the SLDP algorithm with a study of Lipschitz optimal value functions and a decomposition algorithm for deterministic Lipschitz optimization. Then, we present the SLDP algorithm for both the full-tree and sampled scenario cases, and prove their convergence. Finally, we illustrate our results with a case study.
2 Lipschitz value functions
Our SLDP algorithm uses Lipschitz cuts to approximate the cost-to-go function of a stochastic MILP in a similar fashion to the nested cutting planes algorithm for stochastic linear programs [Ruszczynski and Shapiro, 2003]. That is, we also compute lower approximations that iteratively improve the cost-to-go approximation in a neighborhood of the optimal solution.
The purpose of this section is to motivate the use of Lipschitz cuts for nonconvex functions, especially for optimal value functions of mixed integer problems. We start with the definition and basic properties of Lipschitz functions as well as the convergence proof of an algorithm for Lipschitz optimization that employs special Lipschitz cuts called reverse norm cuts. The idea of reverse norm cuts also appears in Global Lipschitz Optimization (GLO), see [Mayne and Polak, 1984], [Meewella and Mayne, 1988] and more recently in [Malherbe and Vayatis, 2017]. However, our aim is not to develop a new algorithm for GLO but to explain the reverse norm cut algorithm and establish some results to extend it for stochastic multistage MILP programs.
Then, we recall the definition of Augmented Lagrangian duality and the exactness results in [Feizollahi et al., 2017] and argue how they can be used to construct augmented Lagrangian cuts in place of the reverse norm cuts. This provides a unified framework, generalizing both nonlinear reverse norm cuts and the linear Lagrangian cutting algorithms in the continuous setting of [Thome et al., 2013] or the binary setting of SDDiP [Zou et al., 2018].
Finally, we show how this theory applies to MILP value functions.
2.1 Lipschitz functions and reverse norm cuts
Let us recall the definition and some results for Lipschitz functions. We say that is a Lipschitz function with constant if for all we have . Note that the linear function is Lipschitz with constant , where indicates the dual norm. Let and be Lipschitz functions with constants and , respectively. It can be shown that
- •
the maximum and minimum of and are Lipschitz functions with constant ; and
- •
the sum of and is a Lipschitz function with constant .
Consider now an example of a non-convex Lipschitz function. Let be the piecewise linear function defined on by
[TABLE]
see figure 1 for an illustration. Note that can be written as the minimum of linear functions, , so is a Lipschitz function with constant .
It can also be seen from figure 1 that the tightest lower convex approximation for over is the zero function. This implies that there will always be a gap between linear cuts and the original function , since the best they could do is reproduce the Lagrangian relaxation . In other words, there is no way to use lower linear cuts to close the gap with , which motivates the introduction of reverse norm cuts.
Definition 1**.**
A reverse norm cut for a function , centered at and with parameter , is the function
[TABLE]
Note that is a Lipschitz function with constant , and if and are reverse norm cuts with parameters and then is a Lipschitz function with constant .
Suppose that is a Lipschitz function with constant . If is greater than or equal to , then all functions are valid reverse norm cuts, for any center point . Indeed,
[TABLE]
for all .
A fundamental difference between reverse norm cuts and cutting planes is that, in general, one cannot guarantee that a piecewise linear Lipschitz function can always be represented as the maximum of a finite number of reverse norm cuts. Indeed, as can be seen in figure 2a, one would need all reverse norm cuts centered at every to represent the non-convex function .
Sometimes, it is possible to obtain a reverse norm cut centered at a point with Lipschitz constant less than , but such reverse norm cut may not be a lower approximation elsewhere if translated in the domain. In figure 2b, we show a tighter cut with centered at , but this lower cannot be used for any other point in the domain of . If the constant is greater than , the corresponding reverse norm cut is a lower approximations of if centered at any point by the same deduction made in (1).
Finally, observe that if the function is not Lipschitz then reverse norm cuts may need arbitrarily large parameters, depending on their center point. Indeed, let be the fractional-part function:
[TABLE]
which is both piecewise linear and the optimal value function of a MILP, see figures 3a and 3b.
As the point approaches , the opening of the reverse norm cuts centered at goes to zero, so their corresponding Lipschitz constant must go to infinity.
2.2 Optimization with reverse norm cuts
To develop the intuition for the SLDP method that will be presented in section 3, we introduce the reverse cut method for a deterministic problem and its convergence proof. We hope that this simpler setting helps the reader understand some of the basic mechanisms of the proof without the notational burden of the general stochastic multistage setting.
The deterministic optimization problem that will be investigated is
[TABLE]
where is a ‘simple’ function and is a ‘complex’ one, so while it is relatively cheap to optimize , we only assume that it is possible to evaluate at given points. For example, this can be the stage problem in the nested decomposition of a multistage problem, where is the immediate cost and the average cost-to-go.
Assume that is a Lipschitz function with constant and is a compact subset of . The reverse cut method, presented in algorithm 1, similarly to decomposition methods, iteratively approaches by reverse norm cuts centered at points obtained along the iterations of the algorithm. Those points are solutions of an approximated optimization problem obtained from (2), where is replaced by the current Lipschitz approximation .
The Lipschitz approximation is the maximum between the reverse norm cut centered at and the previous approximation :
[TABLE]
By induction, all are Lipschitz functions with constant , since it is the maximum of reverse norm cuts with that constant.
With this observation in mind, we will prove the convergence of algorithm 1. The compactness assumption of is used only to ensure the existence of a cluster point for the sequence of trial points .
Lemma 1**.**
Suppose Algorithm 1 does not meet the stopping criteria for problem (2). Let be any cluster point of the sequence of trial points and let be the indices of a subsequence that converges to . Then converges to :
[TABLE]
Proof.
We will bound by the distance between the subsequence and the cluster point . Using the triangular inequality and the Lipschitz definition, we get
[TABLE]
Since converges to along , we only need to prove that the lower bounds will get arbitrarily close to at the trial points, before it is updated. By construction of the reverse cut method, is less than or equal to , so we only need upper bounds. Let be the index just before in the subsequence . Since is -Lipschitz,
[TABLE]
But also by construction, , so subtracting the equation above from and applying once again the Lipschitz hypothesis on we obtain:
[TABLE]
By replacing (5) in (4) and taking the limit over , we conclude that the sequence converges to . ∎
From the convergence result above, we now separate the analysis of the cases where is strictly positive and zero respectively in Corollary 2 and Theorem 3 below.
Corollary 2**.**
For any stopping tolerance , Algorithm 1 stops in a finite number of iterations with an -optimal solution for (2).
Proof.
From inequality 5 in the proof of Lemma 1 and compactness of , Algorithm 1 will stop in a finite number of iterations if the stopping tolerance is strictly positive. Using the fact that is a feasible solution to (2) and optimal solution to (3) such that , where that is a lower estimate for , we have the following inequalities:
[TABLE]
which means that is bounded between and . ∎
Theorem 3**.**
Consider the stopping tolerance equal to zero. Then, Algorithm 1 stops with an optimal solution in a finite number of iterations or it generates a sequence of optimal value approximations that converges to the optimal value of problem (2) and every cluster point of the sequence is a minimizer of (2).
Proof.
Suppose that Algorithm 1 stops after a finite number of iterations. Using the same argument of Corollary 2, we obtain that the last trial point is the optimal solution to (2).
Now, suppose that Algorithm 1 never reaches the stopping condition, so we know from Lemma 1 that the sequence converges to . Moreover, is a feasible solution to the main problem (2) and optimal solution to the approximate problem (3). Since is a lower estimate for , we obtain the following relationships:
[TABLE]
Taking the limit over on both sides of (6), and by continuity of and , we obtain:
[TABLE]
which shows that all inequalities above are equalities, and therefore is an optimal solution to (2).
Going back to the full sequence, we recall that the sequence of objective functions is monotone nondecreasing, so the sequence of optimal values is also monotone and nondecreasing, which implies that also converges to . ∎
Observe that the proof of Lemma 1, Corollary 2 and Theorem 3 use only two properties of the reverse-norm cuts. Besides their Lipschitz character, they are exact at trial points, that is, . Therefore, any other way of producing uniformly Lipschitz tight cuts for yields a convergent Lipschitz cut method for optimizing .
So, just before presenting a class of MILPs with Lipschitz value functions, we show how augmented Lagrangian duality can be used to produce tight Lipschitz cuts.
2.3 Augmented Lagrangian cuts
Recall the definition of augmented Lagrangian duality for mixed-integer optimization problems. Let
[TABLE]
be a parameterized linear problem, where describes the mixed-integer constraints.
Definition 2**.**
Given an augmenting function , which is non-negative and satisfies , the augmented Lagrangian for problem (7) is given by
[TABLE]
Since any feasible solution to the original optimization problem (7) remains feasible with the same objective value for the augmented Lagrangian (8), we see that, for any , and ,
[TABLE]
Moreover, on the MILP setting, we have exact duality [Feizollahi et al., 2017, Theorem 4, p. 381] if the augmenting function is a norm. More precisely:
Theorem 4**.**
If the set is a rational polyhedron with integer constraints, if the problem data , and are rational and if for the given the problem is feasible with bounded value , then there is a finite such that
[TABLE]
In addition, one can choose a finite Lagrange multiplier that attains the supremum.
This motivates the introduction of augmented Lagrangian cuts using norms as the augmentation function. Indeed, expanding the definition of in the weak duality equation (9), we get for any :
[TABLE]
By the triangular inequality,
[TABLE]
so that
[TABLE]
Therefore, calculating the augmented Lagrangian at provides a lower Lipschitz estimate for :
Definition 3**.**
If is the optimal value function for a MILP, and given , and , the augmented Lagrangian cut centered at is the function
[TABLE]
Moreover, the exactness result above shows that there exists a sufficiently large and appropriate for which , so the cut is tight at . However, this proof stills leaves open the question of whether the family of such cuts is uniformly Lipschitz, since as seen in the fractional-part example in figure 3 one might need arbitrarily large near discontinuities of the value function.
The Lipschitz setting that we assume for the value function allows us to bypass this difficulty: choosing and produces a valid and tight cut, so this provides an absolute upper bound on the needed for exact augmented Lagrangian duality. This is why we proceed to characterize some MILPs with Lipschitz value functions, before moving to the multistage case.
2.4 MILPs with Lipschitz value functions
As we have seen, the Lipschitz property of the value functions was an important piece in the proof of convergence of the reverse-norm method and also in the uniform bounds on for the augmented Lagrangian cuts. It will also be fundamental in the analysis of the SLDP algorithm, and therefore we present in this section a sufficient condition that guarantees Lipschitz continuity for the optimal value function of MILPs.
First, we show that the optimal value function for a Lipschitz objective over a family of polyhedra is still Lipschitz. Then, by enumerating the polyhedra over the realizations of integer variables, we will arrive at the complete continuous recourse (CCR) condition that ensures the optimal value function of a MILP is Lipschitz.
Consider the parameterized optimization problem
[TABLE]
for a Lipschitz function and a polyhedron . In order to analyze the function , we have to understand the effect of the variations on the feasible set with respect to the parameter , and compound this effect with the Lipschitz function . For the first part, the Hoffman Lemma below provides the answer, since it states that the symmetric difference between any two sets and is bounded by a ball centered at the origin with a radius proportional to the norm . This result resembles a version for sets of the Lipschitz continuity definition.
Lemma 5** (Hoffman lemma [Shapiro et al., 2014]).**
Let be a polyhedron parameterized by the right-hand side vector of a given linear system, that is, . Let be a vector such that is nonempty. Then, there exists depending only on such that
[TABLE]
where is the unit ball, i.e., .
With this, we can guarantee that the optimal value function of a minimization problem of a Lipschitz function over a given polyhedron is also Lipschitz in its essential domain. The same holds for a maximization problem because if is a Lipschitz function then so is , then, switching from maximization to minimization, Theorem 6 allows us to conclude the same result about the corresponding optimal value function .
Theorem 6** (Lipschitz cost-to-go functions).**
Let be the optimal value function defined by
[TABLE]
where the set is a polyhedron in , the function is Lipschitz continuous with constant , and we assume that there is one value of such that is finite. Then, the essential domain of , , is a polyhedron, and the function restricted to is Lipschitz continuous with constant , where is a constant that depends only on .
Proof.
First, we prove that is a polyhedron. Recall that is the set of vectors for which the problem (11) is feasible, that is, . Since is continuous, does not assume anywhere, so is the projection of over the component :
[TABLE]
As the image of a polyhedron by a linear map is also a polyhedron, we conclude that is a polyhedron in .
Now, we need to prove that is Lipschitz continuous over . Denote by the linear constraint that defines , that is, , and let be the set given by the linear system . Now, let and be two points in the domain of . Taking a feasible point for the problem defined by , and applying the Hoffman Lemma for , we get that there is a feasible point such that
[TABLE]
Therefore, by the Lipschitz hypothesis on ,
[TABLE]
Taking the infimum over , we see that . If is not , this shows that as well, so never assumes the value .
Finally, by symmetry, we obtain , which is the Lipschitz condition for . ∎
To handle the Stochastic MILP case, it would be convenient if the minimum of Lipschitz functions over a union of polyhedra was Lipschitz as well. Indeed, one could split the optimization variable over the integer and continuous variables and obtain
[TABLE]
Since each function is Lipschitz continuous by Theorem 6 above, we’d be done. However, this is not true in general, because the domains of each may be different.
Indeed, we illustrate in figures 4a and 4b an example of a discontinuous optimal value function induced by the problem of minimizing a linear objective function over the union of two polyhedra. In this particular example, the feasible set is the intersection between the blue dashed line and the union of both vertical and horizontal rectangles, while the objective function is a linear function that decreases as the solution candidate moves to the left. We show in figures 4a and 4b the optimal solution of this problem for two different right-hand side parameters, which control the height of the dashed line. As that parameter changes, the dashed line moves up or down, and the optimal solution changes abruptly as soon as a point in the horizontal rectangle becomes feasible, as occurs in figure 4b. This shows that the optimal value function is discontinuous, so it cannot be Lipschitz continuous.
In order to guarantee the Lipschitz condition for the optimal value function, we need to assume an additional hypothesis about the union of polyhedra that defines the feasible set of the optimization problem with Lipschitz objective. Our typical optimization problem is
[TABLE]
where is a finite index set, and is a polyhedron for each . One sufficient condition for the Lipschitz continuity of is to assume that is a Lipschitz continuous function in and that equals for each . This is called the Complete Continuous Recourse (CCR) condition, compare [Zou et al., 2018, Definition 1]. Indeed, under the CCR assumption, each optimal value function defined by the optimization problem
[TABLE]
is Lipschitz continuous in (Theorem 6) and since equals we conclude that is also Lipschitz continuous in .
3 The Stochastic Lipschitz Dynamic Programming algorithm
In this section, we present the Stochastic Lipschitz Dynamic Programming (SLDP) algorithm for the multistage case in two different approaches: the full scenario and the sampled scenario case. In the full scenario approach, all the nodes are visited in the forward and backward steps, and Lipschitz cuts are added for every expected cost-to-go function. In contrast, in the sampled scenario approach, just the sampled scenarios are visited in the forward and backward steps, and Lipschitz cuts are added only for the expected cost-to-go functions of the sampled nodes. We prove convergence to an optimal policy for the full scenario case, and we introduce an additional procedure in the sampled scenario case to ensure convergence towards an -optimal policy.
3.1 Multistage setting and Lipschitz continuity of cost-to-go functions
For dealing with the stochastic multistage setting, we fix some notation for describing the scenario tree. Let be the set of nodes, where is the root node, and is the ancestor function that associates each node except the root to its ancestor node . We also define the set of successor nodes as the set of nodes with ancestor , that is, , and for each successor there is a transition probability denoted by from node to . From and , we define the stage of a node : the root node belongs to stage , and inductively the nodes in belong to the stage . The set of all nodes in stage is denoted by .
In the dynamic programming formulation of stochastic optimization, we have a state variable and a mixed integer control variable , ranging over a feasible set and incurring an immediate cost . As it is standard, we introduce a copy variable that carries the information from the previous state, so that the cost-to-go and expected cost-to-go functions and of each node satisfy the following recursive relationship:
[TABLE]
The nodes without successor are called leaf nodes of the tree, and they correspond to the last decision to be taken in the planning horizon. Also, observe that the root node does not have an ancestor, so we can still define by (18), but its stage problem (17) should be written as
[TABLE]
However, in order to avoid having to single out this special case, we slightly abuse notation by fixing , , and extend to a further dimension .
From our discussion in the previous section, we assume that is Lipschitz continuuos with constant and that satisfies the complete continuous recourse condition. Under those hypothesis, both the cost-to-go functions , and the expected cost-to-go functions are Lipschitz continuous.
Proposition 7** (Stochastic multistage MILP programs).**
Consider the stochastic multistage MILP program defined by (17) and suppose that for every node the cost-to-go function is not equal to in any point, i.e., , and the CCR condition holds for the feasible set . Then, the expected cost-to-go function is Lipschitz continuous in with Lipschitz constant at most
[TABLE]
where is a constant that depends only on .
Proof.
We proceed by backward induction on the scenario tree. For the leaf nodes, the statement holds by definition, since is identically zero.
So, suppose this result holds for all successor nodes , and let’s prove that it also holds for node . By the induction hypothesis, the expected cost-to-go functions are Lipschitz with constant , and from Theorem 6 each is Lipschitz with constant , where is the Lipschitz constant of the objective function and is a constant from the Hoffman Lemma that only depends on . Since the expected value of Lipschitz functions is also Lipschitz with constant equal to the expected value constant, the induction step is proved. ∎
Since problem (17) for each node admits the Lipschitz decomposable structure of (2), one could imagine using the reverse-norm method of Algorithm 1, or augmented Lagrangian cuts, to approximate its solution. However, in the multistage case we lack one fundamental property we used, namely that we can compute exactly the ‘complex’ function , which in this case is . Indeed, we’re only able to produce lower approximations for it, and the next sections will deal with the necessary estimates to prove convergence under this weaker hypothesis.
3.2 Approximating the value functions
Before we present the SLDP algorithm, we need to introduce some notation for the approximations along the iterations of the algorithm. As usual, we denote by the expected cost-to-go approximation induced by the Lipschitz cuts at iteration . For the purpose of convergence analysis, we consider the approximations and of the cost-to-go and expected cost-to-go functions at iteration defined below:
[TABLE]
We assume we are given, for the first iteration, a Lipschitz lower approximation of the expected cost-to-go function for each node in the tree. In practice, the first expected cost-to-go approximations are identically zero, since costs are usually non-negative.
Then, we update the expected cost-to-go approximation of iteration at a given point using the reverse-norm cut :
[TABLE]
where is any constant greater than or equal to the Lipschitz constant defined on (19). Note that we have used instead of in the cut update (25) because the Lipschitz cuts of the SLDP are updated from the last to the first stage, so all expected cost-to-go approximations of the successor nodes of are updated to before the computation of the optimal value (23) with state . So, given each node and iteration we obtain in the backward step the cost-to-go approximation evaluated at , and since the expected cost-to-go approximation is the corresponding weighted average, we obtain for the Lipschitz cut (25).
There are some important comments about the concepts introduced so far. First, we note that the sequence is a non-decreasing sequence of functions, and since it belongs to the objective function of (23), we conclude that the sequence of cost-to-go function approximations is also non-decreasing. Second, the expected cost-to-go approximation defined in (24) is a weighted average of non-decreasing functions , so the corresponding sequence is also non-decreasing. Third, the function plays an important role in the convergence analysis of the SLDP since the cuts of (25) are tight for at the forward solution . Last, the quality of the expected cost-to-go approximation of a given node depends on the quality of those approximations at the successor nodes , so this explains the reason of computing Lipschitz cuts from the last to first stage.
We will prove convergence of the full scenario (resp. sampled scenario) SLDP algorithm by proving that the sequence of feasible policies \big{(}x_{n}^{k}\big{)}_{n\in\mathcal{N}} produced by the algorithm converges to an optimal (-optimal) policy \big{(}x_{n}^{*}\big{)}_{n\in\mathcal{N}}. As in Lemma 1, we show that converges to (and -approximation of) where is the true expected cost-to-go function (18). In the examples of section 4, we will see that both expected cost-to-go approximations and approximate the true expected cost-to-go function in a neighborhood of the optimal (-optimal) policy solution , however those approximations are usually poor elsewhere.
To simplify our statements and make the logic in the proofs easier to follow, we will assume that the starting lower bounds are also valid lower bounds for , which imposes a compatibility constraint between and its successors for . If one doesn’t have this property, then only the inequalities and are ensured in the following Lemma. Again, in common situations where costs are positive and all , this is immediately satisfied.
By the definition of cost-to-go and expected cost-to-go approximations, they form a monotone sequence of valid lower bounds for the true expected cost-to-go functions:
Lemma 8**.**
Consider the stochastic multistage MILP program (17) satisfying the CCR condition, and let , and be the cost-to-go and expected cost-to-go approximations of (23), (24) and (25). Then
[TABLE]
for every node and iteration .
Proof.
We proceed by backward induction on the tree. For leaf nodes, inequality (26) holds because and are identically zero, by definition. Suppose that inequality (26) holds for every successor node at iteration . By the induction hypothesis, the function is less than or equal to , and by the optimization problems (17) and (23) we conclude that the cost-to-go approximation is less than or equal to the true cost-to-go function . Since we guarantee this property for every successor node , we get the same inequality for their respective weighted averages and .
Now, let’s prove that is less than or equal to by induction on the iteration . In the first iteration, the cost-to-go approximation is less than or equal to by hypothesis. Suppose that is less than or equal to for every iteration less than . We will prove that such inequality also holds for iteration . Indeed, by the induction hypothesis and the non-decreasing property of , we have the following inequalities:
[TABLE]
Using the updating formula (25), we conclude that is less than or equal to because the reverse-norm cut is also a lower bound for . ∎
Throughout this paper we assume the CCR condition for the true stochastic multistage MILP program (17). Additionally, we also require the set of feasible policy’s states to be compact, and we name the resulting assumption as the Compact State Complete Continuous Recourse (CS-CCR) condition.
3.3 Full scenario approach
The SLDP algorithm for the full scenario approach is analogous to the Nested Cutting Plane algorithm, but with Lispchitz cuts instead of linear cuts. As described in Algorithm 2, starting from a valid lower bound for all cost-to-go functions, and an upper bound for their Lipschitz constants, we improve the lower bounds near the candidate optimal solutions of each iteration. Thus, in the forward step, the full scenario SLDP algorithm solves the optimization problems (23) from the root to the leaves, that is, in ascending order of stages, and obtains feasible state and control variables for each node of the scenario tree. Then, in the backward step, it updates from the leaves to the root the expected cost-to-go approximation using formula (25) and the Lipschitz cuts centered at the states obtained in the forward step.
We have not provided a stopping criterion for Algorithm 2. Although in the full scenario case we could have chosen a criterion equivalent to the one in Algorithm 1, in the sampled case one would need to compute the optimal solution at every node of the scenario tree to have a deterministic upper bound for the optimal policy, which is unrealistic. So, we preferred to emphasize the similarities between the full and sampled scenario cases. and present their convergence results only in asymptotic form.
In order to simplify the notation and improve readability, we will assume that a variable or a vector that do not have the subscript is the vector composed by the corresponding variables or vectors for all nodes:
- •
;
- •
(x,y,z):=\Big{(}(x_{n},y_{n},z_{n})\Big{)}_{n\in\mathcal{N}}.
We refer to as a policy and as the policy’s states, and we denote by the set of feasible policies and by the projection of in the policy’s states.
By analogy with the proof of the (deterministic) reverse-norm method in Lemma 1 and Theorem 3, we start proving that the expected cost-to-go approximation approximates the true expected cost-to-go function in a neighborhood of any cluster state induced by the forward step.
Lemma 9**.**
Let be a cluster point of the sequence of policy states generated by the forward step of Algorithm 2, and let be the indices of a subsequence that converges to . Then converges to ,
[TABLE]
for every node .
Proof.
Let be the sequence of policies obtained in the forward step of algorithm 2. By the compactness assumption of , the set of feasible policy states is also compact, so there is a subsequence of that converges to a cluster point . Denote by the indices of this subsequence, that is, . We will show that equation (27) holds by backward induction on the tree. It trivially holds for the leaf nodes, since both functions and are identically zero, by hypothesis.
Now, suppose that equation (27) holds for every successor node . From Lemma 8, we have an upper bound:
[TABLE]
which, by continuity of , yields:
[TABLE]
So we only need to prove that the lower approximations are large enough.
As in Lemma 1, we denote by the index in immediately before . By monotonicity of the approximations, is larger than all of the Lipschitz cuts constructed, in particular the one at iteration . Therefore,
[TABLE]
Note that, differently from Lemma 1, we don’t obtain the exact expected cost-to-go function , but only its approximation . That’s why our proof splits in two parts: one bounding the difference between and , and the other bounding the one between and . Let’s complete the first one, which we already started. Since is an increasing sequence, , and we obtain
[TABLE]
To show that and are close, we use their definitions in (24) and (18):
[TABLE]
where the inequality follows because are optimal solutions to (23) and feasible solutions to (17) for each . Taking (29) and (30) together and rearranging terms we get
[TABLE]
Now, take the limit as goes to , which also makes grow to , and both and converge to . Since the expected cost-to-go function is continuous, we obtain:
[TABLE]
because both residual terms vanish in the limit, the second one going to zero by our induction hypothesis. Together with the upper bound from equation (28), this shows that the limit exists and concludes our proof. ∎
As a consequence of Lemmas 8 and 9, the expected cost-to-go approximation also approximates the true expected cost-to-go function in a neighborhood of any cluster point of the sequence of feasible policy’s states induced by the forward step of the full scenario SLDP. Using the argument that leads to inequality (30) in the proof of Lemma 9 we get that the cost-to-go approximation also approximates the true cost-to-go function in a neighborhood of any cluster policy state. That is, the following limits also hold:
[TABLE]
for any convergent subsequence of policy states induced by the forward step of the SLDP algorithm, and the corresponding limit point.
Theorem 10**.**
The sequence of lower bounds induced by the SLDP algorithm 2 converges to the optimal value of the true stochastic multistage MILP program (17), and every cluster point of the sequence of feasible policies generated by the forward step of Algorithm 2 is an optimal policy.
Proof.
Let be the sequence of feasible policies generated by the full scenario SLDP algorithm 2. Let be the set of indices of a convergent subsequence of policy states , and let be the corresponding limit point, which exists by compactness of . As for equation (30), at the root node we have
[TABLE]
because is a feasible solution to the optimization problem whose optimal value is and optimal solution to that whose optimal value is . Using Lemma 9, we conclude the convergence of the subsequence to . Since the whole sequence is non-decreasing, we get convergence to .
Now, suppose that there is a cluster point of the sequence of feasible policies , and denote also by the set of indices of the corresponding subsequence. In order to prove that is an optimal policy, we need to show that its components are optimal solutions to the optimization problem of each node whose optimal value is . We will proceed by forward induction on the tree. Indeed, we have just shown that is an optimal solution at the root node. Now, assume that this result holds for the ancestor node . Using the same argument as before, we have the following inequalities:
[TABLE]
So, taking the limit over on both sides of the inequality and using Lemma 9, we conclude that is an optimal solution of the optimization problem whose optimal value is . ∎
Just as it was the case for the proof of both Lemma 1 and theorem 3, we again just use the same properties of the reverse-norm cuts, namely that they are uniformly Lipschitz and that we are able to construct exact cuts at trial points, for the approximate future cost function . As before, this shows that any method of producing uniformly Lipschitz tight cuts in the nested form of stochastic optimization problems will result in a convergent algorithm on the full scenario approach. In particular, one can use the augmented Lagrangian cuts from section 2.3 provided one takes large enough that the resulting cut is exact.
3.4 Sampled tree approach
In multistage stochastic programming problems with a reasonable number of stages, it is computationally intractable to visit every node of the scenario tree. So, one needs to sample paths on the scenario tree and iteratively approximate the expected cost-to-go functions at each stage to obtain a “reasonable” solution. In this paper, we focus on the sampling scheme of one random path per forward iteration, but its conversion to more general schemes is straightforward.
We emphasize that a path on the scenario tree is chosen at random, so a node may not belong to the path of some forward step iterations. Let be the set of iterations of the Sampled-SLDP for which the path of the forward step contains the node . Note that is a random set, since it depends on each experiment , and the probability of node being draw an infinite number of times equals one, i.e.,
[TABLE]
by the Borel-Cantelli Lemma. We will assume a realization of the sampling where this is the case, to avoid repeating “with probability one” in what follows.
Also, observe that the collection of sets induced by the successor nodes covers , that is
[TABLE]
since a path that contains a node also contains some successor node . In the deterministic case, the set of iterations equals for every node , since all nodes are visited in the forward step. In the analysis of the Sampled-SLDP algorithm, we need to refer to optimal solutions of nodes that do not belong to a given forward path, even if in practice they are not computed. We still use the same notation to refer to an optimal solution of node and iteration .
Following the same organization of the previous sections, we would like to prove that for each node there is a subset of such that the following limit holds:
[TABLE]
where is a subsequence of policy states converging to a limit point . However, the main obstacle of this lemma is the induction step, since we need to control the difference between and using inequality (30) or some variation, as in the proof of Lemma 9. Inequality (30) directly is not suitable for the proof, because there we implicitly used that equals for every successor node to be able to use the induction hypothesis (31).
In order to ensure convergence of the Sampled-SLDP algorithm, we consider an additional step to stabilize the policy states obtained in the forward step. Instead of computing reverse norm cuts at every new forward solution , we check if the new feasible point is more than away from all previous forward solutions . If this is the case, then we update the expected cost-to-go function with the reverse norm cut centered at the new policy state ; otherwise we improve it at the closest previous forward solution , see Algorithm 3. Note that after a finite number of iterations the forward incoming state becomes trapped in a finite number of possibilities, since node will be visited an infinite number of times and the set of feasible policy states is compact. We also show in Lemma 11 that the expected cost-to-go approximation converges in a finite number of iterations to a Lipschitz function , which is an -approximation of the true expected cost-to-go function at any cluster point .
Lemma 11**.**
With probability one, the sequence of expected cost-to-go approximations generated by Algorithm 3 converges to a Lipschitz function with constant after a finite number of iterations. Moreover, the following relationships hold for every node of the tree:
[TABLE]
where is a subset of indices from such that the sequence of policy states converges, is the corresponding limit point, is the maximum Lipschitz constant and is the maximum penalty constant over all nodes .
Proof.
We start proving the finite convergence of to , and the limit in (32) by backward induction on the tree. In the last stage this result is trivial since both functions and are identically zero.
Let be a node such that the statement (32) holds for every successor node . Recall that the updating rule of the reverse norm cut has the form:
[TABLE]
where the expected cost-to-go approximation is the weighted average of the cost-to-go approximations over the successor nodes . By the induction hypothesis, after a finite number of iterations we obtain
[TABLE]
In other words, both the cost-to-go and the expected cost-to-go approximations stabilize after a finite number of iterations, so denote by and the corresponding limits, respectively. Since the number of different incoming states is also finite, this implies that the number of different possible reverse norm cuts to update is also finite. Then, converges to a function in a finite number of iterations.
Now, let’s prove inequality (33) by backward induction on the tree. It is trivial at the leaf nodes, so suppose inequality (33) holds for every successor node . Since there is a finite number of different possible policy states at the node , the sequence converges to in a finite number of iterations, which means that the reverse norm cut is also considered in the expected cost-to-go limit . In particular, we have the following inequalities:
[TABLE]
where the last inequality results from Lemma 8. But the expected cost-to-go approximations and are equal at , so we obtain the following equation for the difference between and the true cost-to-go function :
[TABLE]
Now, we have the crucial part of the argument. Because the expected cost-to-go approximations of all nodes stabilize, every incoming state of any successor node equals after a large number of iterations. Then, the optimal solution of node with input state is equal to , which is less than away from the final state of node , by the design of the Sampled-SLDP algorithm. Then, we obtain the following inequalities:
[TABLE]
where the first inequality results from being the optimal policy’s state of node with input state , and the following ones from the Lipschitz property of and , respectively. By our induction hypothesis,
[TABLE]
and since we get
[TABLE]
because and are at most far away from each other. So, the upper bound (36) together with equation (35) concludes the induction step. ∎
Theorem 12**.**
With probability , the sequence of lower bounds generated by Algorithm 3 converges in a finite number of iterations to an -approximation of the true optimal value , where , and every cluster point of the sequence of feasible policies generated by the forward step of Algorithm 3 is an -optimal policy.
Proof.
This is a straightforward result of Lemma 11, using the same reasoning as in Theorem 10. ∎
4 Examples
In this section, we will present two applications of the SLDP algorithm for stochastic optimization. The first is a simple example of a 1-dimensional dynamics with discrete control. Due to its symmetry and relative simplicity, it is possible to evaluate the cost-to-go functions, so that we can understand the behavior of the algorithm in its different forms. The second one has been extracted from [Carøe and Schultz, 1997] and [Ahmed et al., 2004], and is a 2-stage problem, for which enumeration can be performed in order to also evaluate the optimal solution and cost-to-go function.
4.1 Implementation details
The non-convex cuts used in SLDP are represented as inequalities of the form
[TABLE]
where for the reverse-norm cuts, but is needed for the augmented Lagrangian cuts. To incorporate them in the stage problems, this requires choosing a norm, and a MIP formulation of this constraint. For the experiments below, we have used the norm
[TABLE]
and each term in (37) is given by the sum from the following system:
[TABLE]
The constants are large enough so that has diameter less than , which is ensured by the compactness assumption of .
Observe that this formulation includes a binary variable (and two continuous variables), per dimension of , for each new non-convex cut we introduce. This makes each iteration of the SLDP algorithm much more expensive than previous ones.
One practical implementation of the SLDP method uses augmented Lagrangian cuts, and increases progressively. Since by construction the augmented Lagrangian cuts are valid, if is not large enough then the cuts might not be tight, but they might fill faster the non-convex regions of the cost-to-go function. Also, in analogy to Strengthened Benders cuts, it is possible to fix both the Lagrange multiplier and the augmenting term, and solve the resulting augmented Lagrangian relaxation. This again yields a valid cut, which we call strengthened augmented Benders cut.
All results below were obtained using Julia-0.6.3 [Bezanson et al., 2017] and the Julia packages SDDP.jl [Dowson and Kapelevich, 2017] and SDDiP.jl [Kapelevich, 2018], besides our own Julia implementation for both Lipschitz and strengthened augmented Benders cuts [Freitas Paulo da Costa, 2019], extending SDDP.jl. The computations were performed on an Intel(R) Xeon(R) CPU E5-2603 CPU.
4.2 A 1-dimensional control problem
We consider the following multistage control problem:
[TABLE]
The state variable is 1-dimensional, as the discrete control , and the uncertainty . The objective is to minimize the expected displacement away from zero, subject to a decay factor , over the planning horizon . We fix , , , and at each stage we consider 10 independent scenarios symmetrically sampled around 0.
We will compare the performance and the policy generated by several methods: a convex approximation using Strengthened Benders cuts (shortened as SB), the original SLDP algorithm using reverse-norm cuts (SLDP tents), a modified SLDP algorithm using ALD cuts with increasing augmentation (SLDP ALD), and the SDDiP algorithm [Zou et al., 2018], using two discretization steps: and . The resulting discretized problems for SDDiP don’t have complete continuous recourse, since the state cannot absove the noise below the discretization level, and we only have a discrete control, so we also add a slack variable and penalize it in the objective function. The original problem, with continuous state, doesn’t need adjustments.
We present in Table 1 the lower bounds, the estimated upper bounds using policy simulations and the computation times after 100 iterations for each method.
The convex approximations stall at a very low lower bound, while the non-convex methods all have better estimates, but they also need significantly more computation time. The SLDP approximations have a very similar performance — the ALD method requiring slightly more time. SDDiP has a relatively good performance with step , but not with . Observe that the higher lower bound for SDDiP with step also comes with a higher upper bound, which is due to the addition of the penalization term and a loose state discretization. When the discretization step is the smaller , the upper bounds of the simulation agree more closely with the other cases, but we spend more in computation time, and the lower bounds are much further away.
As we can see in figure 5, the future cost functions are nonconvex at all time stages, essentially driven by the discontinuous control : the immediate cost is , where .
However, the future cost functions built by the convex Strengthened Benders cuts can’t pierce into the nonconvexities, and become flat over , as depicted in figure 6. This explains why the convex approximations perform so poorly in this case.
The Lipschitz cuts, on the other hand, are indeed able to yield a better approximation of the problem, and approximate the expected cost-to-go functions inside their nonconvexities. The same happens with the value function obtained by SDDiP. In figure 7, we show a comparison of the expected cost-to-go functions thus constructed, using reverse-norm cuts, augmented Lagrangian cuts, SDDiP, and the actual future cost function.
One particular feature of this example is that the future cost functions are continuous in the original variables , but since the control is , the stage problems lack the continuous recourse property. For this reason, when one takes the SDDiP discretization of the state variable, one must also include a slack term to the state dynamics. This increases the costs overall, and explains why the value functions estimated with the “coarse” discretization with are higher than the true expected cost-to-go functions . The lower bounds obtained with the “fine” discretization with , on the other hand, “detach” much faster from the respective as we move back in the stages of the tree.
Finally, we compare in figure 8 the evolution of the lower bounds, both as iterations and time increase.
There, we see that both methods for SLDP were essentially equivalent, maybe except at the beginning, where the ALD’s was probably too small to yield good enough cuts. However, as iterations progressed, and was large enough, the algorithm quickly reached a comparable lower bound. We also include a curve for the time taken per iteration, to highlight the rapid increase in time for the SLDP algorithm, even in a 1-dimensional problem. This same phenomenon happens for SDDiP, on a much smaller scale, but it already starts out with a significantly larger iteration time.
4.3 A 2-dimensional example
We also study another example, taken from [Carøe and Schultz, 1997] and further adapted in [Ahmed et al., 2004]. This is a 2-stage problem, with cost-to-go function
[TABLE]
The random variable lies in , and is approximated using points on the square, where .
In [Carøe and Schultz, 1997], the authors consider the optimization problem with discrete first-stage decisions:
[TABLE]
The same value function for the second stage can be used for a continuous first-stage problem, as treated by [Ahmed et al., 2004], where one drops the constraint that the first-stage variables belong to .
In this case, the cost-to-go function is discontinuous in , so for both the SLDP and SDDiP we would need to add slack variables and their corresponding penalization in the objective function. Therefore, for this experiment we used only the convex approximations and the ALD cuts, which are valid despite the discontinuities of the value function. The convex approximations were calculated using 100 iterations (but stalled much before that), while the ALD approximations were carried for 200 iterations. This is more than the number of possible values for the state variable , in the discrete case.
We give in tables 2 and 3, respectively for the discrete and continuous first stage variables, the lower bounds and computation time required for each method. We also include the true objective for each value of . In all cases, the convex cuts using Strenghtened Benders are not able to close the gap, so we report the remaining gap for the SLDP-ALD as the ratio
[TABLE]
of the remaining gap. In the discrete case of table 2, we obtain exact results for and , and a significant reduction for the case . The fact that it does not converge is due to the need of having even larger values for to get tight cuts for every node in the second stage. The continuous case is harder, and we have gaps now at as well. Besides the difficulties of achieving tight cuts as in the discrete case, the algorithm also needs to explore several points in the neighborhood of the optimal solution.
It is remarkable that the times required by the Lipschitz cuts is much smaller in the discrete setting than in the continuous setting, contrary to what happens for the convex case, which solves a harder problem in the first stage and therefore takes slightly more time. This is probably explained by the SLDP algorithm constructing cuts at the same points, and therefore the resulting stage problems don’t become much more difficult as times passes, as opposed to the continuous case, where the nodes for each cut are probably different. Also note that all those times, and especially the ALD times, are much larger than the ones reported in [Ahmed et al., 2004]. Besides a different computational setting, we don’t explore the fact that the technology matrix is deterministic.
5 Conclusion
In this paper, we proposed a new algorithm for solving stochastic multistage MILP programming problems, called Stochastic Lipschitz Dynamic Programming (SLDP). Its major contribution is the inclusion of nonlinear cuts to iteratively underapproximate nonconvex Lipschitz cost-to-go functions. We explored two such families of cuts: (a) the ones induced by reverse-norm penalizations; and (b) augmented Lagrangian cuts, built from norm-augmented Lagrangian duality.
Assuming the Compact State Complete Continuous Recourse conditions, we proved convergence of the algorithm in the full scenario setting. In the sampled case, we provided an approximation method to reach -optimal policies in finite time. Besides asymptotic convergence in the general Stochastic Multistage Lipschitz case, it would be interesting to prove a finite convergence result for Stochastic MILPs in the sampled case by further exploring the structure of the value functions in each stage.
Our experiments suggest that, at least for small-dimensional problems, the performance of the SLDP algorithm is reasonable.
Acknowledgements
This research has been supported in part by the National Science Foundation grant 1633196, the Office of Naval Research grant N00014-18-1-2075, the COPPETec project IM-21780.
The second author would like to express his gratitude to the Brazilian Independent System Operator (ONS) for its support for this research.
This research project was concluded while the third author was visiting Georgia Tech during a sabbatical leave from UFRJ. He would like to warmly thank the hospitality and the excelent environment of the ISyE institute.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[Ackooij et al., 2018] Ackooij, W., Lopez, I. D., Frangioni, A., Lacalandra, F., and Tahanan, M. (2018). Large-scale unit commitment under uncertainty: an updated literature survey. Annals of Operations Research , 271(1):11–85.
- 2[Ahmed et al., 2004] Ahmed, S., Tawarmalani, M., and Sahinidis, N. V. (2004). A finite branch-and-bound algorithm for two-stage stochastic integer programs. Mathematical Programming, Series A , 100:355–377.
- 3[Bezanson et al., 2017] Bezanson, J., Edelman, A., Karpinski, S., and Shah, V. B. (2017). Julia: A fresh approach to numerical computing. SIAM Review , 59(1):65–98.
- 4[Birge and Louveaux, 2011] Birge, J. R. and Louveaux, F. (2011). Introduction to stochastic programming . Springer Science & Business Media.
- 5[Carøe and Schultz, 1997] Carøe, C. C. and Schultz, R. (1997). Dual decomposition in stochastic integer programming. Operations Research Letters , 24:37–45.
- 6[Cerisola et al., 2012] Cerisola, S., Latorre, J. M., and Ramos, A. (2012). Stochastic dual dynamic programming applied to nonconvex hydrothermal models. European Journal of Operational Research , 218(3):687–697.
- 7[Conejo et al., 2016] Conejo, A. J., Morales, L. B., Kazempour, S. J., and Siddiqui, A. S. (2016). Investment in Electricity Generation and Transmission: Decision Making Under Uncertainty . Springer Publishing Company, Incorporated, 1st edition.
- 8[Costley et al., 2017] Costley, M., Feizollahi, M. J., Ahmed, S., and Grijalva, S. (2017). A rolling-horizon unit commitment framework with flexible periodicity. International Journal of Electrical Power & Energy Systems , 90.
