Two-Stage Dual Dynamic Programming with Application to Nonlinear Hydro Scheduling
Benjamin Flamm, Annika Eichler, Joseph Warrington, John Lygeros

TL;DR
This paper introduces a two-stage dual dynamic programming method combining nonlinear and linear models for long-term hydro scheduling, with proven convergence and practical advantages over existing approaches.
Contribution
It extends dual dynamic programming to nonlinear hydro problems using a Benders decomposition approach with convergence guarantees.
Findings
Near-optimal solutions achieved with short nonlinear initial stage
Method outperforms conventional dynamic programming and existing McCormick envelope approaches
Bounded suboptimality related to linear-nonlinear trajectory mapping
Abstract
We present an approximate method for solving nonlinear control problems over long time horizons, in which the full nonlinear model is preserved over an initial part of the horizon, while the remainder of the horizon is modeled using a linear relaxation. As this approximate problem may still be too large to solve directly, we present a Benders decomposition-based solution algorithm that iterates between solving the nonlinear and linear parts of the horizon. This extends the Dual Dynamic Programming approach commonly employed for optimization of linearized hydro power systems. We prove that the proposed algorithm converges after a finite number of iterations, even when the nonlinear initial stage problems are solved inexactly. We also bound the suboptimality of the split-horizon method with respect to the original nonlinear problem, in terms of the properties of a map between the linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Two-Stage Dual Dynamic Programming with Application to Nonlinear Hydro Scheduling
Benjamin Flamm, Annika Eichler, , Joseph Warrington, , and John Lygeros The authors are with the Automatic Control Laboratory of ETH Zürich, Physikstrasse 30, 8092 Zürich, Switzerland. {flammb, eichlean, warrington, lygeros}@control.ee.ethz.ch
Abstract
We present an approximate method for solving nonlinear control problems over long time horizons, in which the full nonlinear model is preserved over an initial part of the horizon, while the remainder of the horizon is modeled using a linear relaxation. As this approximate problem may still be too large to solve directly, we present a Benders decomposition-based solution algorithm that iterates between solving the nonlinear and linear parts of the horizon. This extends the Dual Dynamic Programming approach commonly employed for optimization of linearized hydro power systems. We prove that the proposed algorithm converges after a finite number of iterations, even when the nonlinear initial stage problems are solved inexactly. We also bound the suboptimality of the split-horizon method with respect to the original nonlinear problem, in terms of the properties of a map between the linear and nonlinear state-input trajectories. We then apply this method to a case study concerning a multiple reservoir hydro system, approximating the nonlinear head effects in the second stage using McCormick envelopes. We demonstrate that near-optimal solutions can be obtained in a shrinking horizon setting when the full nonlinear model is used for only a short initial section of the horizon. For this example, the approach is shown to be more practical than both conventional dynamic programming and a multi-cell McCormick envelope approximation from literature.
Index Terms:
dual dynamic programming, optimal control, nonlinear model predictive control, hydro optimization.
I Introduction
The control of energy storage devices over a long planning horizon is gaining in importance, as the number of renewable and variable energy sources on the electric power grid increases. U.S. installed wind and solar generation capacity increased by 11% and 52% respectively in 2016 [1]. Energy storage is particularly well-suited to complement the resulting unpredictable power generation. Long-term energy storage exists in different forms, with varying technical maturity and efficiency [2]. Examples include natural gas for heating, compressed air electrical storage, ground thermal energy storage, and hydro reservoir systems; the latter have been extensively utilized due to their large storage capacity, technological maturity, and proliferation. Key challenges for the integration of seasonal storage into next generation energy systems are the long horizons and nonlinear system dynamics, which can render long-term control of the storage computationally difficult.
We typically wish to operate long-term storage devices in response to underlying energy demand or supply patterns that are cyclical over a period of months or years. Examples of such patterns include yearly snow melt, as well as seasonal heating and cooling demands.
Several methods have been proposed to tackle nonlinear seasonal storage problems. Many involve approximating the nonlinear dynamics through modeling simplifications and heuristic methods such as timescale separation [3]. Specifically for hydro optimization, a fundamental difficulty is the presence of nonlinear head effects when converting between stored water and electrical energy. Many papers ignore nonlinear head effects and represent energy conversion as a constant efficiency [4], [5]. Methods that do account for head effects usually generate convex hulls of the power production function, either by fitting a set of piecewise linear constraints to the true model [6], [7], or by rewriting nonlinear terms using convex approximations such as McCormick envelopes [8].
Other methods take advantage of improvements in computing power, which allows increasingly large nonlinear control problems to be solved exactly. Approaches that use nonlinear models usually consider only short horizons [9]. Techniques such as spatial branch-and-bound and dynamic programming allow for the solution of nonlinear problems to high precision, but scale poorly with the state space size.
Dual dynamic programming (DDP), also known as multistage Benders decomposition, was introduced as a method for seasonal hydro storage scheduling in [10], but has since been used primarily for solving linear approximations of these problems. DDP was extended to convex problems using generalized Benders decomposition in [11]. Roughly speaking, DDP bounds the value function of a convex problem by a piecewise affine function of the initial state. Since DDP cannot solve problems with nonconvex value functions, convex approximations of the model are required when using DDP for nonlinear problems. Recent extensions of this approach have considered integer programs [12], locally-valid Benders cuts [13], and polynomial-based moment relaxation [14].
In [15], a method was introduced to treat general receding horizon nonlinear optimal control problems. The full problem horizon was split into short- and long-term parts, with high model accuracy in the short term, and reduced model accuracy in the long term. Here we expand this approach by introducing several convergence and optimality results related to solving this approximation of the underlying nonlinear problem. Unlike [15], in Theorem 1 we provide a guarantee on convergence of the proposed DDP algorithm, even when the nonlinear first-stage problem is solved suboptimally. Additionally, inspired by [16], where linear subproblems were solved to a known tolerance in DDP, in Theorems 2 and 3 we provide bounds on the suboptimality of the solution of the two-stage approximation. These bounds are a function of the first-stage nonlinear solver tolerance as well as the approximation error between linear and nonlinear models in the second stage. We also demonstrate the generality of the approach by applying it to a different kind of nonlinearity than in [15], involving bilinear as opposed to integer terms. Finally, we quantify how the control horizon (the decision length used each time the model predictive control (MPC) problem is solved) and modeling accuracy affect optimality in simulation.
The underlying stochasticity of real-time optimal control problems is an additional source of computational complexity, and lends itself to the use of DDP. This can be treated by introducing scenarios to capture different potential realizations of the stochastic processes [17]. With additional variables introduced for different scenarios, long-term problems with even simple models can push the limits of tractability. While much of the hydro optimization literature incorporates stochasticity, we consider a deterministic setting here to focus on the underlying computational method and treatment of nonlinearity.
This paper is structured as follows. Section II introduces the general case of the split-horizon approximate problem, as well as a DDP-based algorithm to solve it. The optimality of the solution produced by the algorithm is derived as a function of the optimality of the first-stage solution. In Section III, a bound is found for the error of the two-stage approximation relative to the exact problem. Section IV formulates the nonlinear hydro optimal control problem, providing specific models and approximations of the underlying hydro system. In Section V, we analytically determine the error bound on McCormick envelope approximations of bilinear terms occurring in the problem. Section VI presents simulation results for a representative hydro system, illustrating the computational benefits of the multistage method. Section VII concludes with analysis of possible improvements and extensions to the proposed method.
II Split-Horizon Problem and DDP Algorithm
II-A General Problem Formulation
We wish to solve a generic discrete-time optimal control problem for a system with dynamics , input and state constraints defined by the set , and cost function over a horizon of length . The state trajectory is denoted by , while the input trajectory is , where and . The optimal inputs are determined by solving the nonlinear program (NLP)
[TABLE]
where , , and are general nonlinear functions on the set . The dynamics, costs, and constraints can all be time-varying. This formulation is general in terms of variable type. For example, integer variables can be considered by adding constraints to .
Note that this is a quite general nonlinear programming problem whose complexity grows rapidly in the dimension of the state and input, as well as the horizon length.
II-B Two-Stage Approximation
In [15], to solve (1) over a long horizon, the authors proposed solving an approximate problem consisting of a short initial stage of length where the exact NLP holds, and a subsequent longer stage of length where the problem is approximated by a linear program (LP):
[TABLE]
Here, the first stage of the problem, from to , has the exact nonlinear cost function, dynamics, and constraints. In the second stage of the problem, these are replaced with linear costs, dynamics, and constraints.
Note that the state of the approximate linear part at time is denoted by (instead of ), to stress that the second-stage approximation is different from the original problem, potentially incorporating different state variables at each timestep (for example, a relaxation of integer variables in the original problem to real-valued variables in the linear part). However, by (2d), the states and coincide at .
We now recast the nonlinear approximate problem (2) as an equivalent two-stage problem. The nonlinear first stage can be written as
[TABLE]
In (3a), is a value function that represents the cost of the linear second stage of (2) as a function of the state at the end of the first stage, with
[TABLE]
The division of the horizon into the two stages and intermediate value function is depicted in Figure 1.
To ensure that the subsequent theory and analysis is meaningful, we assume that both (1) and (2) are feasible. We additionally make the following assumption, which facilitates the iterative solution of the two-stage problem.
Assumption 1**.**
The linear second-stage problem (4) is feasible for all that are feasible for the first-stage problem (3).
This assumption, known in the mathematical programming literature as complete recourse, is not strictly necessary, as the theory related to Benders decomposition used here also works for problems where the second stage is not feasible for a particular . In that case, Benders feasibility cuts on can be iteratively added to the first-stage problem (3), as is done in [11]. However, for simplicity we omit this case here.
II-C Split-Horizon Approximate DDP Algorithm
In [15], we presented an algorithm that converged in a finite number of iterations to an optimum of the approximate two-stage problem (2), provided the first-stage NLP (3) was solved to global optimality. In certain cases, it may not be desirable or possible to solve (3) to global optimality, due to an algorithmic choice of solution tolerance [16], [18], or constraints on available solution time.
Algorithm 1 presents an adaptation of this setting, where the first-stage NLP (3) is not solved to global optimality. Before analyzing the convergence of this algorithm, as described in Theorem 1, we first explain the value function approximation used therein. We wish to solve a two-stage problem (2) using DDP. To do this, we iteratively solve the two stages of the problem, using DDP to progressively construct a set of lower-bounding hyperplanes for the second-stage value function . The lower-bounding hyperplanes are found using the following lemma.
Lemma 1**.**
(Lemma 1 in [15]) Given a solution to (4) for some , let and be the dual variables corresponding to the constraints (4b) and (4c) at timestep . Then, is a lower bound on for all .
Lemma 1 allows one to build a collection of lower-bounding hyperplanes for . Let and specify the parameters of the lower bound in Lemma 1, found for a particular . Each iteration of Algorithm 1 solves (4) for a different , leading to a new hyperplane . We can combine the hyperplanes into the set , and construct the approximate value function
[TABLE]
which is itself a lower bound on . Thus, each iteration of Algorithm 1 adds a hyperplane constraint to .
In words, Algorithm 1 starts by solving the first-stage NLP with no information about the second stage. This provides a guess for the initial state of the second-stage LP. In step 2, the LP is solved using this . The resulting objective, computed in step 3, is the exact second-stage value function at . In step 4, if this objective equals the previously-found value function approximation evaluated at , i.e. the current value function approximation is tight at the new , then the problem terminates. Otherwise, in the key step 5 of the algorithm, Lemma 1 is used to generate a new lower-bounding hyperplane for the second-stage value function . This hyperplane is then incorporated into the second-stage value function approximation , which is a lower bound on . In step 6, we again solve the first-stage NLP, this time with the updated . This iterative solution of the first and second stages is repeated until the previously-found value function approximation is tight at the found when solving the NLP in step 6.
Theorem 1**.**
Provided Assumption 1 holds, steps 1 and 6 return feasible solutions of (3), and step 2 solves (4) to optimality, then Algorithm 1 terminates in a finite number of iterations, returning a feasible solution of the split-horizon problem. In addition,
- (a)
If the nonconvex first stage is solved in step 6 to local optimality, then the returned solution is locally optimal for the two-stage problem (2).
- (b)
If the nonconvex first stage is solved in step 6 to global optimality, then the returned solution is globally optimal for the two-stage problem (2).
Proof.
We first show that Algorithm 1 terminates in a finite number of iterations.
Suppose step 6 (or step 1 for the first iteration) has previously returned a particular feasible solution of (3). Due to Assumption 1, solving (4) at returns feasible and . Since the second-stage LP (4) is solved to optimality,
[TABLE]
is by definition the exact second-stage value function evaluated at the particular .
Step 5 adds the dual-feasible vertex to . By the strong duality of (4), after is updated in step 6,
[TABLE]
at the particular .
Since (4) is a linear program, there are a finite number of distinct dual-feasible vertices . If solving the LP (4) in step 2 yields a dual-feasible vertex that has already been found in a previous iteration of the algorithm, then is unchanged. If solving the NLP in step 6 returns the same as in the previous iteration, (7) already holds. Thus, the value function approximation is tight at , , and Algorithm 1 terminates in step 4.
Otherwise, we add a hyperplane to parameterized by the new dual-feasible vertex. Since the number of such vertices is finite, one either achieves a complete characterization of , or else sets before this point.
The resulting first- and second-stage arguments are each feasible, and can be concatenated (as in the algorithm) to form feasible solutions and of the split-horizon problem.
Proof of (a). We use Algorithm 1 to solve (2) to convergence. This returns , , and the approximate value function .
Each algorithm iteration, we solve an approximation of the first-stage NLP (3), replacing with , and solving to local optimality. This means that when the algorithm has converged, there exist and such that, for all for , for ,
[TABLE]
By Lemma 1, since the second stage is an LP,
[TABLE]
Considering the case where the algorithm has converged to a particular , and combining (7), (8), and (9),
[TABLE]
This result holds for all for , for . Thus, and are locally optimal solutions to the exact (3).
To complete the proof of (a), note that the linear second-stage problem (4) is solved to optimality for the given first-stage arguments.
Proof of (b) follows from Theorem 1 of [15]. ∎
We point out that the result holds regardless of whether the first stage is solved with a deterministic solver. Note that when solving the first stage NLP (3) with a given , a nondeterministic solver might not return the same feasible solution in subsequent iterations. Nevertheless, the solver will return an , where either the existing hyperplanes provide a tight bound to the value function (in which case the and agree in Algorithm 1), or a new hyperplane will be added to . As there are a limited number of hyperplanes to add, the algorithm will converge in a finite number of iterations.
III Error bound on two-stage approximation
The results in the previous section establish the properties of solutions for the two-stage problem (2) returned by Algorithm 1. Since the original intention was to approximate the nonlinear program (1), one would also like to know how far the optimal solutions of the two-stage approximate problem (2) are from those of the true nonlinear problem (1). Related results are provided in [16], which gives approximation bounds for a setting where each stage of a multi-stage linear program is solved to a certain tolerance. Here, we extend this approach to address the case of (2), where instead of solving a linear program to a known tolerance, we approximate the second stage of the nonlinear problem (1) with a linear program, and solve that linear program exactly.
For a given timestep , let denote the set of arguments of the second-stage LP (4) that satisfy (4c). Recall that denotes the set of arguments satisfying (1c) at time .
Assumption 2**.**
For all , there exists a map such that
- (a)
There exists a , such that for all ,
[TABLE]
- (b)
* is surjective, i.e. for all , there exists such that .*
In other words, at each timestep, there is a transformation from feasible arguments of the LP to feasible arguments of the NLP, such that the LP objective is an underestimate of the NLP objective, with a maximum underestimate of . Additionally, all feasible solutions of the NLP can be found from a feasible solution of the LP through the transformation.
Note that by (2d), at , leaves unchanged, and maps the input to a feasible input of the NLP.
We first consolidate the terminology for the exact NLP (1) and two-stage approximation (2):
- •
is the optimal value of the exact NLP (1), as a function of the initial state .
- •
is the value of the second part of the exact NLP (1) (), as a function of the intermediate state . Note that
[TABLE]
- •
is the value of the second-stage LP (4), as a function of the intermediate state .
- •
is the approximate value function for the second-stage LP (4) in Algorithm 1. It consists of a finite number of hyperplanes bounding from below. Hyperplanes are iteratively added to this set via DDP.
Now, we consider the objective for the split-horizon problem (2) after Algorithm 1 has converged, and define it as
[TABLE]
Our aim is to bound the difference between and .
Theorem 2**.**
If Assumptions 1 and 2 hold, and step 6 is solved to -optimality in each iteration, then
[TABLE]
Proof.
By the definitions of (1) and (2), the NLP and split-horizon problems are equivalent for the first part of the horizon, and differ only in the second stage.
By Lemma 1, the Benders decomposition underestimates the objective of the second-stage LP, and . By Assumption 2, for any solution to the true NLP, a corresponding solution to the LP approximation exists with a lower objective. Combining these two observations, for any ,
[TABLE]
If we solve (11) to -optimality in step 6, returning and , then
[TABLE]
By (12), since and meet the constraints of (10) (and (11)), the objective of (11) is less than or equal to the objective of (10), and thus
[TABLE]
The second inequality (15) comes from substituting in the right inequality of (13) into (14). The final inequality (16) is because for any feasible solution, in particular the found -suboptimal solution.
Assumption 2 states that at each second-stage timestep, all feasible arguments of are the result of the mapping from feasible arguments of , such that the objective of is an underestimate of the objective of by at most . Taking the worst case for each timestep from to , for any ,
[TABLE]
When Algorithm 1 has converged, by (7), , and thus
[TABLE]
Combining (14), (16), and (18),
[TABLE]
∎
We now consider the difference between the optimum of the exact problem, and the objective of the exact problem evaluated using the arguments found in the approximate problem.
Theorem 3**.**
If Assumptions 1 and 2 hold, step 6 is solved to -optimality, and step 6 returns and when Algorithm 1 converges, then
[TABLE]
Proof.
The first inequality is straightforward: is the optimal objective of the NLP (1). Evaluating the objective of the NLP at any other feasible solution cannot lead to a lower objective.
Now, for the sake of notational ease, denote the resulting arguments when step 6 is solved to -optimality as and . Furthermore, denote the first-stage cost as .
Due to the -optimality of , for all feasible ,
[TABLE]
To prove the second inequality in the theorem, consider the difference between the NLP objective evaluated using , and the optimal NLP objective , with corresponding arguments :
[TABLE]
The inequality (21) comes from substituting (18) for . The inequality (22) is due to choosing in (20). The equality (23) comes from expanding in terms of its optimal arguments. Finally, (24) holds because , as in (12). ∎
By solving the first stage to a higher level of optimality and using a tighter linearization, the difference between the exact and approximate problem can be reduced. Nevertheless, the numerical results presented below suggest that very good performance can be obtained for a relatively coarse approximation (see Section V-D for a comparison with a state-of-the-art method).
IV Hydro Reservoir System Model and Approximation
We apply the above split-horizon approximation method to a system of interconnected reservoirs, with a specific topology of pumps and turbines that allow the reservoirs to exchange water. Denote the set of reservoirs to which reservoir can charge or discharge as and the set of reservoirs which can charge or discharge to reservoir as . We can transfer water volume from each reservoir to any . The power associated with this action, , is positive when pumping water to a higher elevation reservoir or negative when releasing water through a turbine to a lower elevation reservoir. We buy and sell power on the electricity spot market at price . This convention on device power and electricity price means that a negative objective value corresponds to a profit, while a positive objective value represents a loss. The aim is to maximize profit by taking advantage of spot price fluctuations. We assume the volume stored in reservoir is linearly proportional to the reservoir level , with a proportionality constant reflecting the surface area of the reservoir. That is, if is transferred from reservoir to , will decrease by (and will increase by ). For simplicity, we assume that there are no inflows or evaporative losses in this system.
Intuitively, the problem can be cast in the finite horizon optimal control framework considered here by setting , with for all and , with . We wish to consider a horizon that is sufficiently long to capture seasonal price fluctuations.
IV-A Nonlinear Exact Model
We assume that the energy associated with transferring a unit volume of water from reservoir to depends on the net head between reservoirs and [8]:
[TABLE]
Letting , we write the optimal control problem as
[TABLE]
By absorbing equation (25b) into the objective function, we can move the bilinear terms from the constraints into the objective, and eliminate the variables . The objective (25a) then becomes
[TABLE]
with the bilinear term equal to
[TABLE]
The reformulated problem thus has linear constraints (since (25b) has been removed), and an objective with linear and bilinear terms. For sufficiently small horizons, we can find the global optimum in a reasonable length of time using YALMIP’s built-in BMIBNB spatial branch-and-bound solver [19]. We have observed that Matlab’s fmincon also tends to return the global optimum for small problem instances in the simulations presented in Section V.
IV-B Linear Approximate Model
A common way of solving problem (25) is to solve a convex outer approximation. One way to do this is to replace the constraint (25b) with
[TABLE]
Here, we have introduced the additional variables and to represent the bilinear terms and respectively. Incorporating this linear constraint into the objective, the bilinear term (27), is thus approximated by
[TABLE]
We then linearize using McCormick envelopes [8], adding to the problem for each the inequalities
[TABLE]
Here, the upper and lower McCormick bounds, for , and for , can depend on the timestep. This results in a linear program, with objective (26) (with the bilinear term replaced by (29)), and constraints (25c)-(25e), (30), and a similar set of McCormick bounds for each .
IV-C Error Bound on McCormick Envelope Approximation
We now calculate the maximum error from approximating the bilinear relation by its McCormick envelope in (30). We drop the time, source, and destination indices in the subsequent analysis for cleaner notation.
For each bounding hyperplane in (30), the points where the first derivative of the approximation error with respect to either or is zero have zero approximation error. For example, for the first hyperplane (30a), the approximation error is . The first derivatives are zero where or , where the error is also zero.
Thus, the largest errors must occur at the intersection of the bounding hyperplanes. Considering Figure 2, the largest overestimate must occur at the intersection of the two upper-bounding hyperplanes (30c) and (30d), and the largest underestimate at the intersection of the two lower-bounding hyperplanes (30a) and (30b) (the intersection of upper- and lower-bounding hyperplanes, e.g., (30a) and (30c), is an exact representation of the underlying function).
We determine the largest overestimate analytically. At the intersection of the two upper-bounding hyperplanes,
[TABLE]
Solving for in terms of ,
[TABLE]
The approximation error at the intersection of the two upper-bounding hyperplanes is
[TABLE]
Taking the derivative with respect to , we have . The derivative of the approximation error is zero at with the corresponding . This leads to a maximum overestimate of
[TABLE]
By similar calculations, or by symmetry, the maximum underestimate occurs at the same and , with a negative sign in the error.
IV-D Error Bound on Two-Stage Approximation
We solve the two-stage approximate problem using Algorithm 1. Since the two-stage approximation differs from the exact problem only in the second stage, a bound on the two-stage approximation error can be found by summing the worst-case approximation error (31) for each second-stage bilinear term in (27). In the context of Assumption 2, at each second-stage timestep , the worst-case bound between the exact and approximate objectives is
[TABLE]
Solutions to the two-stage approximation are also feasible for the exact problem, so the map between solutions of the two problem formulations is simply identity.
We assume that the first stage is solved exactly, i.e., . Denoting the approximate solution as and optimal solution as , and applying Theorem 2, achieves a maximum underestimate of
[TABLE]
The absolute limits on the reservoir levels (25d) and volume flows (25e) can be used in the McCormick envelopes, but these crude estimates can be improved. Tighter bounds on the volume flows are not known a priori, but the bounds on the reservoir levels can be tightened by incorporating the system dynamics (25c), and assuming maximum inflows and outflows at each timestep:
[TABLE]
For the split-horizon problem, the bounds must be found relative to the starting level for the entire problem, rather than the starting level for the linear part. Otherwise, the approximation of the bilinear terms changes at each iteration, leading to a different linear program each iteration and voiding the convergence guarantee of Theorem 1.
V Numerical Results
V-A Experimental System
We apply the split-horizon approximation and solution method of Algorithm 1 to the system of two reservoirs and an infinite-capacity basin depicted in Figure 3. This example system is inspired by [7], which considers a single reservoir of the same volume and height used here. The reservoirs have identical capacities of \text{,}\mathrm{m}, and their bottoms are separated by a height of 200\text{,}\mathrm{m}. The water volume in each reservoir is proportional to the water level relative to the reservoir bottom, with maximum levels $\bar{\ell}^{a}=$85\text{\,}\mathrm{m} and 100\text{,}\mathrm{m} for the upper and lower reservoir, respectively. The reservoir bottoms are connected by a reversible $100\text{\,}\mathrm{MW}$ pump/turbine which operates with $\mu=90\%$ one-way conversion efficiency. The lower reservoir is connected to an infinite basin ($\bar{\ell}^{bas}=\underline{\ell}^{bas}=$0\text{\,}\mathrm{m}) situated below its bottom, via another identical reversible pump/turbine. The reservoir levels are constrained between empty (0\text{,}\mathrm{m}$$) and their maximums. The reservoirs are initially half-full.
The energy conversion parameters for (25b) can be calculated from first principles, since the energy required to raise a given volume a height is , with 1000\text{,}\mathrm{k}\mathrm{g}\mathrm{/}\mathrm{m}^{3} and $g=$9.81\text{\,}\mathrm{m}\mathrm{/}\mathrm{s}^{2}. Taking into account the one-way conversion efficiency , the energy conversion parameters for (25b) are , , , and , where 0.002,725\text{,}\mathrm{kW}$$h. The maximum flow rates in (25e) are calculated as the flows that produce or consume at the initial reservoir levels. Reservoir takes 156 hours to empty into Reservoir at maximum release flow, while Reservoir takes 283 hours to empty into the lower basin.
The terminal cost of water stored in reservoirs is chosen as the energy embodied in a unit volume of water at the beginning of the time horizon, multiplied by the average price throughout the time horizon. This energy includes conversion efficiency losses, and is relative to the zero potential level of the system. Thus, the value of water stored in the upper reservoir takes into account its potential passage through the lower reservoir as well. Price data are taken from the Swiss day-ahead EPEX spot market, starting on January 1, 2017 [20], and are assumed to be perfectly known over the entire horizon.
V-B Solution of Nonlinear and Split-Horizon Problems
We seek to minimize the exact NLP (25), with the total horizon days. As a reference estimate of the global optimal value, we grid both the state and input space into 32 equally-spaced points and solve the approximate dynamic program using the DPM toolbox [21]. We compare this to the solution of the split-horizon problem (consisting of (25) with modifications (28), (29), and (30)) over a shrinking horizon, as depicted in Figure 4. Here, we solve the split-horizon problem and apply the first hours of the solution. We then move forward in time by , compute a solution to the subsequent, shortened split-horizon problem, and so forth. Note that for the problem considered here, solutions to the LP are also feasible for the NLP, and thus the relative length of versus presents no issue for feasibility.
When solving the split-horizon problem, the nonlinear first stage is also solved using the DPM toolbox. The linear second stage is solved using CPLEX 12.7.0. The simulations are run in Matlab, formulated by YALMIP, and solved on a six core Intel Core i7-5820K with 16 GB of RAM.
In Figure 5, the two-stage problem is solved over various exact horizon lengths . The figure shows that as increases (for a fixed total horizon), the cost decreases. When the McCormick bounds are set to the reservoir level limits, using the split-horizon method with hours results in a problem objective which is 23.7% less than that from the linearized problem (where in the figure), and is within 3.9% of the optimum. Since the suboptimality decreases with the exact horizon length, we would choose the longest that allows Algorithm 1 to converge in the allotted time. For the hydro application here, the allotted time would typically be on the order of one hour, as in [3].
When using the tightened McCormick bounds (34) and (35), there is a 0.4% improvement due to modeling 12 hours exactly. Thus, the utility of the split-horizon method depends on the level of accuracy in the linearization. Note that here, the improvement in linearization due to the tightened McCormick bounds is reduced as the length of the exact horizon increases.
The theoretical bounds from (33) state that for the problem considered in Figure 5, if the two-stage problem with hours is solved to global optimality, the objective found for the first timestep is within of the optimum for the case of the loose McCormick bounds, and within of the optimum for the case of the tightened bounds. These bounds are conservative compared to the suboptimality achieved in simulation for two reasons. First, the McCormick envelope approximation considers the worst case combination of inputs, which rarely occurs in practice. This conservatism accumulates at each timestep, rendering bounds at more distant timesteps even more conservative. Second, the bounds are given for a single problem instance, and do not take into account that only part of each solution is used in the receding horizon setting.
In Figure 6, we vary the control horizon over which the computed action is applied. As is also varied, a “knee” appears in the graph, coinciding with . Since is usually a fixed problem parameter, this result suggests we should choose . The objective continues to improve as is increased, but not as significantly as before the “knee.” Also, as gets shorter, the objective improves. This is expected from the receding horizon context, as measurements from the true system are incorporated at a more frequent rate.
In Figure 7, we compare the computation time for the split-horizon approximation (2) versus solving the exact bilinear problem (25) using the DPM toolbox and BMIBNB solver. The split-horizon problem computation time is for solving the first instance of the problem i.e., we compute the solution for the full horizon once, and do not account for solving the problem repeatedly over the shrinking horizon. This reflects the envisioned receding horizon setting: in practice we would solve the problem once, use the solution for the first timestep, and then re-solve a new problem that incorporates updated problem data for the next timestep. We see that the split-horizon method with hours scales well with respect to the total horizon length. In practice, the increase in solution time depends mainly on that of the second stage LP, as the number of Benders cuts for the second stage tends to increase sublinearly as the second stage length grows. For example, the split horizon method in Figure 7 results in two cuts for total horizons less than five days, between three and four cuts from horizons from 5-19 days, and between four and five cuts for horizons up to 25 days. As expected theoretically, the computation time for solving the exact bilinear problem scales exponentially in horizon length when using the global BMIBNB solver, and scales linearly when solving approximately using the DPM toolbox.
V-C Solution of Problems with Higher State Dimension
For the two-reservoir system considered above, the DPM toolbox can solve the exact problem to within the tolerance of the given grid. However, the chief disadvantage of dynamic programming is that the computation time scales exponentially in the problem dimension. When we modify the system of Figure 3 to include an additional reservoir, increasing the state and input dimensions each to three, the DPM toolbox fails to return a solution due to reaching computer-specific RAM usage limits. It is possible to make the DP discretization coarser, but this comes at a cost to optimality.
In contrast to dynamic programming, where one must solve over the entire state space, local solution methods can be used to solve the first-stage nonlinear problem of the split-horizon approximation. The length of the first stage can be chosen short enough so that it is computationally tractable to solve. The experimental results presented here suggest that using even short first stages for the split-horizon approximation leads to near-optimal solutions in a receding horizon setting. In Figure 8, we display the results of using the split-horizon method on the three reservoir system mentioned in the previous paragraph. Using the BMIBNB global solver (solved to a relative tolerance of ) with a control horizon of 12 hours, we found that the split-horizon method with an exact horizon of 12 hours improved the objective by 20%, relative to the solution of the linearized problem. Note that solving the 20 day exact NLP to optimality is computationally intractable.
V-D Comparison to Multi-cell Approximation
For a comparison with other methods which consider local approximations of the bilinearity ([6]-[8]), we also implement the multi-cell McCormick envelope approximation method of [8]. The method is used to approximate the bilinearity (25b) over the entire horizon. Here, we choose to split each volume flow and reservoir level variable into two equal intervals, and generate a McCormick envelope for each of the four resulting cells. For the two-reservoir model of Figure 3, this introduces one binary variable per timestep for each of the four devices and two reservoirs. Considering the 20 day horizon with hourly timesteps, this results in a mixed-integer linear program (MILP) with 2880 binary variables. We solve the MILP in the receding horizon setting of Figure 4, allotting a solution time equivalent to the time used by the split-horizon method for a given exact horizon length.
We produce Figure 9 by varying the exact horizon length of the split-horizon method. This can be viewed as exploration of a Pareto front between computation time and solution optimality. We see that the multi-cell McCormick method performs worse than the split-horizon method. For a 20 day horizon, when both methods use a computation time of 32.4 seconds (the solution time of the split-horizon method with an exact horizon of 0.5 days), the multi-cell method achieves an objective which is within 11.5% of the optimum, while the split-horizon method is within 3.9%. Moreover, we expect the computation time of the split-horizon method to scale better with increasing problem size, compared to the exponential complexity of solving a larger MILP.
VI Conclusion
We have shown that the DDP-based Algorithm 1, when applied to a split-horizon approximation of a bilinear hydro optimization problem, achieves accurate and computationally efficient results compared to solving either the full exact nonlinear problem or its linearization. When the optimization is conducted in a receding horizon manner with exact horizons on the order of the control horizon, the performance is nearly optimal, with significant computational savings when modeling higher-dimensional systems.
The experimental results presented here demonstrate that accurately modeling the first part of the horizon can significantly improve the result relative to a linearized model. This improvement is due to the mismatch between the linearized and true model. Although better linearizations reduce the need for the split-horizon method, it might still be advantageous to use DDP with local linearizations like [8] in the second stage. However, additional research along the lines of [12] is needed to use DDP with the resulting integer variables. We note that the success of the split-horizon method depends on the type of nonlinearity considered, and how well applicable solution methods scale as the first stage length increases.
We also derived bounds for the suboptimality of the split-horizon method. These bounds were found to be quite conservative for the problem here. Error bounds can be tighter for other problem cases and classes. For example, the suboptimality relative to an integer relaxation of an MILP can also be bounded, with a tightness that depends on the problem parameters. However, our case study here shows that while the bounds on a McCormick envelope approximation of a bilinearity are conservative, the split-horizon method used in a receding horizon setting can still achieve good results. Future research could explore how these bounds differ for a receding horizon setting, compared to the single-period optimization for which they were derived.
The split-horizon method is well suited to be extended to stochastic problems e.g., for uncertain prices here. As noted above, stochasticity, perhaps incorporated using a scenario tree approach as in [17], further increases the complexity of solving the problem exactly over long horizons.
Acknowledgment
The authors would like to thank Roy Smith and members of the Building Control Group at the ETH Automatic Control Laboratory for their input into this work. This work is supported by the SCCER FEEB&D project and the European Research Council under the project OCAL, ERC-2017-ADG-787845.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, “2016 Renewable Energy Data Book,” https://www.nrel.gov/docs/fy 18osti/70231.pdf. Accessed Dec. 1, 2017.
- 2[2] H. S. Chen, T. N. Cong, W. Yang, C. Q. Tan, Y. L. Li, and Y.L. Ding, “Progress in electrical energy storage system: A critical review,” Progress in Natural Science , vol. 19, no. 3, pp. 291–312, 2009.
- 3[3] H. Abgottspon and G. Andersson, “Multi-horizon Modeling in Hydro Power Planning,” Energy Procedia , vol. 87, pp. 2–10, 2016.
- 4[4] G. Pritchard, A. B. Philpott, and P.J. Neame, “Hydroelectric reservoir optimization in a pool market.” Mathematical Programming , vol. 103, no. 3, pp. 445–461, 2005.
- 5[5] T. A. Rotting and A. Gjelsvik, “Stochastic dual dynamic programming for seasonal scheduling in the Norwegian power system,” IEEE Transactions on Power Systems , vol. 7, no. 1, pp. 273-279, Feb. 1992.
- 6[6] A. Diniz and M. Maceira, “A Four-Dimensional Model of Hydro Generation for the Short-Term Hydrothermal Dispatch Problem Considering Head and Spillage Effects,” IEEE Transactions on Power Systems , vol. 23, no. 3, pp. 1298-1308, 2008.
- 7[7] A. Borghetti, C. D’Ambrosio, A. Lodi, and S. Martello, “An MILP Approach for Short-Term Hydro Scheduling and Unit Commitment With Head-Dependent Reservoir” IEEE Transactions on Power Systems , vol. 23, no. 3, pp. 1115-1124, Aug. 2008.
- 8[8] S. Cerisola, J.M. Latorre, and A. Ramos, “Stochastic dual dynamic programming applied to nonconvex hydrothermal models,” European Journal of Operational Research , vol. 218, no. 3, pp. 687–697, 2012.
