A Globally Convergent Penalty-Based Gauss-Newton Algorithm with Applications
Ilyes Mezghani, Quoc Tran-Dinh, Ion Necoara, Anthony Papavasiliou

TL;DR
This paper introduces a globally convergent Gauss-Newton algorithm for non-convex, non-smooth optimization problems, demonstrating its effectiveness on power flow problems and achieving comparable performance to established solvers.
Contribution
The paper develops a novel penalty-based Gauss-Newton algorithm with proven global and local convergence properties for non-smooth optimization problems.
Findings
Global convergence to stationary points from any initial point
Local quadratic convergence under certain conditions
Comparable performance to IPOPT on power flow problems
Abstract
We propose a globally convergent Gauss-Newton algorithm for finding a local optimal solution of a non-convex and possibly non-smooth optimization problem. The algorithm that we present is based on a Gauss-Newton-type iteration for the non-smooth penalized formulation of the original problem. We establish a global convergence rate for this scheme from any initial point to a stationary point of the problem while using an exact penalty formulation. Under some more restrictive conditions we also derive local quadratic convergence for this scheme. We apply our proposed algorithm to solve the Alternating Current optimal power flow problem on meshed electricity networks, which is a fundamental application in power systems engineering. We verify the performance of the proposed method by showing comparable behavior with IPOPT, a well-established solver. We perform our validation on several…
| Gauss-Newton | IPOPT | ||||||||
| Test Case | # It | # RI | Objective | MCV | Time | Objective | MCV | Time | Gap |
| 1354pegase | 13 | 0 | 15.1 s | 6.00 s | 0.00 % | ||||
| 1888rte | 14 | 0 | 25.9 s | 59.0 s | 0.02 % | ||||
| 1951rte | 4 | 0 | 6.94 s | 7.67 s | 0.00 % | ||||
| ACTIVSg2000 | 5 | 0 | 8.39 s | 14.8 s | 0.01 % | ||||
| 2383wp | 19 | 2 | 51.0 s | 21.6 s | 0.02 % | ||||
| 2736sp | 4 | 0 | 10.9 s | 13.3 s | -0.06 % | ||||
| 2737sop | 3 | 0 | 7.77 s | 9.98 s | -0.14 % | ||||
| 2746wop | 17 | 2 | 38.1 s | 12.8 s | 0.00 % | ||||
| 2746wp | 3 | 0 | 6.89 s | 16.0 s | -0.05 % | ||||
| 2848rte | 17 | 2 | 52.0 s | 74.3 s | 0.01 % | ||||
| 2868rte | 4 | 0 | 9.11 s | 29.3 s | 0.00 % | ||||
| 2869pegase | 12 | 0 | 41.9 s | 15.5 s | 0.00 % | ||||
| 3012wp | 8 | 0 | 25.2 s | 18.8 s | 0.03 % | ||||
| 3120sp | 13 | 0 | 41.2 s | 19.2 s | -0.04 % | ||||
| 3375wp | 8 | 0 | 34.7 s | 21.3 s | 0.02 % | ||||
| 6468rte | 19 | 2 | 187 s | 103 s | 0.02 % | ||||
| 6470rte | 11 | 0 | 89.3 s | 144 s | 0.01 % | ||||
| 6495rte | 12 | 0 | 180 s | 52.4 s | 0.04 % | ||||
| 6515rte | 18 | 0 | 204 s | 86.9 s | 0.03 % | ||||
| 9241pegase | 17 | 0 | 894 s | 368 s | 0.25 % | ||||
| ACTIVSg10k | 6 | 0 | 117 s | 93.4 s | 0.10 % | ||||
| 13659pegase | 19 | 0 | 137 s | 685 s | 0.62 % | ||||
| ACTIVSg25k | 16 | 0 | 1,740 s | 544 s | 0.25 % | ||||
| No warm-start | With warm-start | |||||
|---|---|---|---|---|---|---|
| Test Case | # It | # ADMM It | Time | # It | # ADMM It | Time |
| 39_epri | 4 | 114,735 | 4.65 s | 4 | 56,988 | 2.15 s |
| 118_ieee | 4 | 162,251 | 29.2 s | 4 | 65,832 | 11.6 s |
| Fixed | Bisection | Geometric-2 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Test Case | # It | MVQ | MVT | # It | MVQ | MVT | # It | MVQ | MVT | |
| 1354pegase | 100 | 100 | 100 | |||||||
| 91 | 3 | 3 | ||||||||
| 78 | 3 | 3 | ||||||||
| 1888rte | 100 | 21 | 100 | |||||||
| 100 | 11 | 100 | ||||||||
| 100 | 15 | 100 | ||||||||
| 1951rte | 86 | 10 | 39 | |||||||
| 66 | 4 | 3 | ||||||||
| 61 | 4 | 3 | ||||||||
| Bisection | Geometric-2 | |||||||
|---|---|---|---|---|---|---|---|---|
| Test Case | # It | # L-It | Obj | # It | # L-It | Obj | ||
| 1354pegase | 3 | 1 | 14 | 6 | ||||
| 4 | 1 | 13 | 4 | |||||
| 1888rte | 8 | 1 | 17 | 1 | ||||
| 11 | 1 | 14 | 3 | |||||
| 1951rte | 4 | 1 | 6 | 1 | ||||
| 4 | 0 | 4 | 0 | |||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Numerical Methods and Algorithms · Advanced Optimization Algorithms Research
A Globally Convergent Penalty-Based Gauss-Newton
Algorithm with Applications
Ilyes Mezghani
Corresponding Author. Center for Operations Research and Econometrics, Université Catholique de Louvain. Voie du Roman Pays 34, 1348 Louvain-la-Neuve, Belgium. E-mail: [email protected]
Quoc Tran-Dinh
Department of Statistics and Operations Research, The University of North Carolina at Chapel Hill. 333 Hanes Hall, Chapel Hill, NC 27599, USA. E-mail: [email protected]
Ion Necoara
Politehnica University of Bucharest, Department of Automatic Control and Systems Engineering. Splaiul Independentei nr. 313, sector 6, 060042 Bucharest, Romania. E-mail: [email protected]
Anthony Papavasiliou
Center for Operations Research and Econometrics, Université Catholique de Louvain. Voie du Roman Pays 34, 1348 Louvain-la-Neuve, Belgium. E-mail: [email protected]
Abstract
We propose a globally convergent Gauss-Newton algorithm for finding a local optimal solution of a non-convex and possibly non-smooth optimization problem. The algorithm that we present is based on a Gauss-Newton-type iteration for the non-smooth penalized formulation of the original problem. We establish a global convergence rate for this scheme from any initial point to a stationary point of the problem while using an exact penalty formulation. Under some more restrictive conditions we also derive local quadratic convergence for this scheme. We apply our proposed algorithm to solve the Alternating Current optimal power flow problem on meshed electricity networks, which is a fundamental application in power systems engineering. We verify the performance of the proposed method by showing comparable behavior with IPOPT, a well-established solver. We perform our validation on several representative instances of the optimal power flow problem, which are sourced from the MATPOWER library.
**Keywords: ** Nonlinear programming, Optimization with non-convex constraints, penalty reformulation, Gauss-Newton method, AC optimal power flow.
1 Introduction
Statement of the problem.
In this paper we are interested in solving the following optimization problem with non-convex constraints:
[TABLE]
For this problem we assume that the objective function is convex and differentiable, is a compact convex set, and the non-convexity enters in (1) through the non-linear equality constraints , defined by .
Note that in the literature we can find many efficient algorithms that are able to minimize non-convex objective functions, but with convex constraints, see e.g., [Chen et al., 2019, Bolte et al., 2014, Patrascu and Necoara, 2015, Nocedal and Wright, 2006]. In this paper, we treat a more general optimization model, where the non-convexity enters into the optimization problem through the constraints.
It is well-known that optimization problems with non-convex constraints are more difficult to solve than convex constrained problems [Nocedal and Wright, 2006]. For the non-convex problem (1), classical non-convex optimization algorithms such as interior-point, augmented Lagrangian, penalty, Gauss-Newton, and sequential quadratic programming methods can only aim at finding a stationary point (i.e. a point that satisfies the first-order optimality conditions), which is a candidate for a local minimum [Nocedal and Wright, 2006, Cartis et al., 2011]. Nevertheless, convergence guarantees of these methods rely on the twice continuous differentiability of the underlying functionals, including the representation of . Our method also approximates a stationary point, but it allows to include nonsmooth convex terms and to be a general convex set (see Subsection 3.4).
For an iterative method to identify a stationary point that is a local minimum, but not a saddle-point, more sophisticated techniques are required, such as cubic regularization [Nesterov and Polyak, 2006] or random noise gradient [Dauphin et al., 2014]. However, it is still unclear how to efficiently implement these methods for solving large-scale optimization problems with non-convex constraints. One of the most efficient and well-established non-linear solvers for finding stationary points is IPOPT [Wächter and Biegler, 2006], which relies on a primal-dual interior-point method combined with other advanced techniques. We emphasize that this classical method is only guaranteed to converge to a stationary point, and often requires a strategy such as line-search, filter, or trust-region to achieve global convergence (i.e. the method is still convergent from a starting point that is far from the targeted stationary point) under certain restrictive assumptions. These methods often assume that the underlying functionals in (1) are twice continuously differentiable, including and . In addition, a linesearch procedure may require that the gradient of the underlying merit function is Lipschitz continuous to guarantee global convergence, see, e.g., [Nocedal and Wright, 2006, Theorem 3.2.].
A Gauss-Newton algorithm for constrained non-convex optimization.
Recently there has been a revived interest in the design and analysis of algorithms for solving optimization problems involving non-convex constraints, in particular in engineering and machine learning [Boob et al., 2019, Bolte and Pauwels, 2016, Cartis et al., 2014, Curtis et al., 2018, Tran-Dinh et al., 2012]. The main trend is in solving large-scale problems by exploiting special structures/properties of the problem model and data towards the design of simple schemes (e.g., solving a tractable convex subproblem at each iteration), while producing reasonable approximate solutions efficiently [Drusvyatskiy and Paquette, 2019, Lewis and Wright, 2016, Nesterov, 2007b].
Following this trend, in this paper we also devise a provable convergent Gauss-Newton (GN)-type algorithm for solving the non-convex optimization problem (1). The idea of the GN method studied in this paper was proposed in [Lewis and Wright, 2016] for minimizing a compositional model of the form , where is possibly nonsmooth. This method targets a different problem class compared to standard GN or Levenberg–Marquardt methods for nonlinear least-squares problems. The main idea is to replace the non-Lipschitz continuous least-squares function in these methods by a given convex and Lipschitz continuous function (but possibly nonsmooth). Nesterov contributed a thorough investigation on convergence guarantees of this method in [Nesterov, 2007b] when is a given norm. This was extended to a more general model that can cover exact penalty methods in a technical report [Tran-Dinh and Diehl, 2011]. Very recently, [Drusvyatskiy and Paquette, 2019] revisited this method under a Moreau’s envelope perspective, but only on global convergence guarantees.
Our algorithm converges globally to a stationary point of the problem in the sense that, starting from any initial point within a given level set, the algorithm converges to a stationary point. In addition, the proposed approach is different from standard GN methods in the literature [Nocedal and Wright, 2006, Deuflhard, 2006] due to the use of a non-smooth penalty instead of a classical quadratic penalty term. This allows our algorithm to converge globally [Nesterov, 2007a]. Hence, we refer to this algorithm as a global GN scheme.
The main idea of our method is to keep the convex sub-structure of the original problem unchanged and to convexify the non-convex part by exploiting penalty theory and the GN framework. Hence, in contrast to IPOPT (solving a linear system of the barrier problem to obtain a Newton search direction), each iteration of our algorithm requires finding the solution of a strongly convex subproblem, which can be efficiently solved by many existing convex solvers. Under some more restrictive conditions, we also derive a local quadratic convergence rate for our GN method.
The optimal power flow problem.
We apply our proposed optimization algorithm to the alternating current optimal power flow (AC-OPF) problem [Frank et al., 2012], which lies at the heart of short-term power system operations [Cain et al., 2012]. We show that this problem can be posed in the framework of non-convex optimization with the particular structure on the constraints as in (1). The optimal power flow (OPF) problem [Carpentier, 1962] consists in finding an optimal operating point of a power system while minimizing a certain objective (typically power generation cost), subject to the Kirchhoff’s power flow equations and various network and control operating limits.
In recent years, there has been a great body of literature that has focused on convex relaxations of the AC-OPF problem, including semidefinite programming relaxations [Lavaei and Low, 2011, Kocuk et al., 2015], conic relaxations [Jabr, 2008, Gan et al., 2014, Kocuk et al., 2016], and quadratic relaxations [Coffrin et al., 2015]. These works have established conditions under which these relaxations are exact, and understanding cases in which this is not so [Molzahn et al., 2013]. However, when these relaxations are inexact, the resulting dispatch is possibly non-implementable. Therefore, our interest in the present paper is to tackle directly this problem as a non-convex optimization problem with non-linear equality constraints.
Contributions.
The main contributions of the paper are the following:
- (i)
We propose a new GN algorithm for solving a general class of optimization problems with non-convex constraints. We utilize an exact non-smooth penalty reformulation of the original problem and suggest a GN scheme to solve this penalized problem, where the subproblem in this scheme is a strongly convex program, which can be efficiently solved by several recent and highly efficient third-party convex optimization solvers.
Since our proposed approach preserves convexity in and of (1) in the subproblem, our method can solve a broader class of problems with theoretical guarantee than classical IP or SQP methods. In particular, our method can solve problem instances of (1) with nonsmooth convex objective terms or semidefinite cone constraints, see Subsection 3.4 for a concrete example.
- (ii)
We establish the best-known global convergence rate (i.e., convergence from any starting point in a given sublevel set) for our method to a stationary point, which is a candidate for a local optimal solution. Moreover, under some more restrictive conditions, we also derive a local quadratic convergence for our scheme.
- (iii)
We apply our proposed algorithm to the quadratic formulation of AC-OPF. We show that the newly developed algorithm can be implemented efficiently on AC-OPF problems and test it on several numerical examples from the well-known MATPOWER library [Zimmerman et al., 2010]. We often observe comparable performance to the well-established and widely-used IPOPT solver. Also, since our approach relies on an iterative scheme, using the previous iterate to accelerate the computation of the next iterate can significantly improve performance: this is what we refer to as warm-start. Even though most solvers do not possess warm-start capabilities for this type of problems, we show how warm-start can significantly improve the performances of the proposed method. Section 3 provides details regarding performance and potential improvements.
We emphasize that, as opposed to the classical GN approach, our proposed algorithm relies on the -norm penalty, which typically has a better condition number than the quadratic penalty, as discussed in [Nesterov, 2007b]. More generally, if we replace this -penalty with any other exact and Lipschitz continuous penalty function, then our theoretical results still hold. Unlike certain sophisticated methods such as Interior-Point Methods (IPMs) and Sequential Quadratic Programming (SQP) schemes, our GN method is simple to implement. Its workhorse is the solution of a strongly convex problem. As a result, its efficiency depends on the efficiency of a third-party solver for this problem as well as the benefit of warm-start strategies. Our main motivation is to exploit recent advances in large-scale convex optimization in order to create a flexible algorithm that can reuse this resource.
Content.
The paper is organized as follows. In Section 2, we introduce our Gauss-Newton algorithm and analyze its convergence properties. In Section 3, we present the AC-OPF problem, its quadratic reformulation and test our Gauss-Newton algorithm on several representative MATPOWER test cases.
2 A Gauss-Newton Algorithm for Non-Convex Optimization
In this section, we present the main assumptions for the non-convex optimization problem (1), propose an exact penalty reformulation, and solve it using a Gauss-Newton-type algorithm. We further characterize the global and local convergence rates of our algorithm.
2.1 Exact penalty approach for constrained non-convex programming
For the non-convex optimization problem (1) we assume that the objective function is convex and differentiable and is a compact convex set. Note that our method developed in the sequel can also be extended non-smooth convex function or smooth non-convex function whose gradient is Lipschitz continuous, but we make this assumption for simplicity of presentation. Furthermore, the non-convexity enters into the optimization problem through the non-linear equality constraints defined by . We assume that is differentiable and its Jacobian is Lipschitz continuous, i.e. there exists such that:
[TABLE]
where is the -norm. Further, let denote the normal cone of the convex set :
[TABLE]
Since problem (1) is non-convex, our goal is to search for a stationary point of this optimization problem that is a candidate for a local optimum in the following sense.
Definition 2.1** ([Nocedal and Wright, 2006](Theorem 12.9)).**
A point is said to be a KKT point of (1) if it satisfies the following conditions:
[TABLE]
Here, is called a stationary point of (1), and is the corresponding multiplier. Let denote the set of these stationary points.
Since is compact, and and are continuous, by the well-known Weierstrass theorem, we have:
Proposition 2.1**.**
If , then (1) has global optimal solutions.
2.2 Exact penalized formulation
Associated with (1), we consider its exact penalty form [Nocedal and Wright, 2006, Chapt. 17.3]:
[TABLE]
where is a penalty parameter, and is the -norm. Two reasons for choosing an exact (non-smooth) penalty are as follows. First, for a certain finite choice of the parameter , a single minimization in of (3) can yield an exact solution of the original problem (1). Second, it does not square the condition number of as in the case of quadratic penalty methods, thus making our algorithm presented below more robust to ill-conditioning of the non-convex constraints. Now, we summarize the relationship between stationary points of (1) and of its penalty form (3). For this, let us define the directional derivative:
[TABLE]
where is one subgradient of at , and denotes the subdifferential of , see [Nesterov, 2007a]. Recall that the necessary optimality condition of (3) is
[TABLE]
Then, this condition can be expressed equivalently as
[TABLE]
where is the set of feasible directions to at :
[TABLE]
Any point satisfying (5) is called a stationary point of the penalized problem (3). Stationary points are candidates for local minima, local maxima, and saddle-points. If, in addition, is feasible to (1), then we say that is a feasible stationary point. Otherwise, we say that is an infeasible stationary point. Proposition 2.2 shows the relation between (1) and (3).
Proposition 2.2** ([Nocedal and Wright, 2006], (Theorem 17.4.)).**
Suppose that is a feasible stationary point of (3) for sufficiently large. Then, is also stationary point of the original problem (1).
Proposition 2.2 requires to be feasible for (1). When the feasible set of (1) is nonempty and bounded, according to [Pillo, 1994, Proposition 2], if (1) satisfies the extended Mangarasian-Fromovitz constrained qualification condition (see [Pillo, 1994, Proposition 2] for concrete definition), then there exists such that for any , every global or local solution of the penalized problem (3) is also a global or local optimal solution of (1), respectively. By [Pillo, 1994, Proposition 3], needs to be chosen such that , where is any optimal Lagrange multiplier of (1). We will discuss in detail the choice of in the sections below.
2.3 Global Gauss-Newton method
We first develop our GN algorithm. Then, we investigate its global convergence rate.
2.3.1 The derivation of the Gauss-Newton scheme and the full algorithm
Our GN method aims at solving the penalized problem (3) using the following convex subproblem:
[TABLE]
where is a given point in for linearization, is the Jacobian of , and is a regularization parameter.
Note that our subproblem (7) differs from those used in classical penalty methods [Nocedal and Wright, 2006], since we linearize the constraints and we also add a regularization term. Thus, the objective function of (7) is strongly convex. Hence, if is nonempty and even if the problem is non-differentiable, this problem admits a unique optimal solution, and can be solved efficiently by several convex methods and solvers. For instance, alternating direction methods of multipliers (ADMM) [Boyd et al., 2011] and primal-dual schemes [Chambolle and Pock, 2011] can be efficient for solving (7). Note that the convergence guarantees of ADMM and primal-dual schemes often depends on the distance between the initial point of the algorithm and the exact optimal solution of of (7), see, e.g, [Chambolle and Pock, 2011, Theorem 2]. Hence, if we warm-start at the previous approximate solution obtained at the -th iteration, then the distance is small. This allows the algorithm to converge faster to a desired approximate solution of (7) at the -th iteration.
Let us define
[TABLE]
The necessary and sufficient optimality condition for subproblem (7) becomes
[TABLE]
where . Given , we define the following quantities:
[TABLE]
Then, can be considered as a gradient mapping of in (3) [Nesterov, 2007a], and is a search direction for Algorithm 1. As we will see later in Lemma 2.2, should be chosen such that . Now, using the subproblem (7) as a main component, we describe our GN scheme in Algorithm 1.
The main step of Algorithm 1 is the solution of the convex subproblem (7) at Step 4. As mentioned, this problem is strongly convex, and can be solved by several methods that converge linearly. If we choose , then we do not need to perform a line-search on at Step 4, and only need to solve (7) once per iteration. However, may not be known or if it is known, the global upper bound may be too conservative, i.e. it does not take into account the local structures of non-linear functions in (3). Therefore, following the algorithm in [Nesterov, 2007a], we propose performing a line-search in order to find an appropriate . If we perform a line-search by doubling at each step starting from , (i.e., ), then after line-search steps, we have , and the number of line-search iterations is at most . Note that it is rather straightforward to estimate . For example, we can set for some and . The penalty parameter can be fixed or can be updated gradually using a run-and-inspect strategy, see next section.
2.3.2 Global convergence analysis
We first summarize some properties of Algorithm 1 when the penalty parameter is fixed at a given positive value for all iterations.
Lemma 2.1**.**
Let be defined by (8), and , , and be defined by (10). Then the following statements hold:
If , then is a stationary point of (3).
The norm is nondecreasing in , and is nonincreasing in . Moreover, we have
[TABLE]
If is Lipschitz continuous with the Lipschitz constant , then, for any , we have
[TABLE]
Proof.
(a) Substituting into (9), we again obtain the optimality condition (5). This shows that is a stationary point of (3).
(b) Since the function is convex in two variables and , we have that is still convex. It is easy to show that . Since is convex, is nondecreasing in . This implies that is nonincreasing in . Thus is nondecreasing in and is nonincreasing in . To prove (11), note that the convexity of implies that
[TABLE]
On the other hand, . Substituting this relation into (13), we obtain (11).
(c) Let use define . From the optimality condition (9), for any , we have
[TABLE]
where . Substituting into this condition, we have
[TABLE]
Since is convex, we have:
[TABLE]
By exploiting the convexity of at point , we have:
[TABLE]
Since is Lipschitz continuous, we also have
[TABLE]
Combining these three bounds, we can show that
[TABLE]
which implies
[TABLE]
Since , we obtain the first inequality of (12) from the last inequality. Moreover, from (4) we have
[TABLE]
Using (14), we can show that , which is the second inequality of (12). ∎
The proof of Statement shows that if we can find such that , then is an approximate stationary point of (3) within the accuracy . From statement , we can see that if the line-search condition at Step 4 holds, then . That is, the objective value decreases at least by after the -th iteration. We first claim that Algorithm 1 is well-defined.
Lemma 2.2**.**
Algorithm 1 is well-defined, i.e. step 4 terminates after a finite number of iterations. That is, if , then .
Proof.
Since is -Lipschitz continuous, for any and , we have
[TABLE]
Using the definition of , we obtain
[TABLE]
From this inequality, we can see that if , then . Hence, Step 4 of Algorithm 1 terminates after a finite number of iterations. ∎
Let be the level set of at . Now, we are ready to state the following theorem on global convergence of Algorithm 1.
Theorem 2.1**.**
Let be the sequence generated by Algorithm 1. Then and
[TABLE]
where . Moreover, we also obtain
[TABLE]
and the set of limit points of the sequence is connected. If this sequence is bounded (in particular, if is bounded) then every limit point is a stationary point of (3). Moreover, if the set of limit points is finite, then the sequence converges to a stationary point of (3). If, in addition, is feasible to (1) and is sufficiently large, then is also a stationary point of (1).
Proof.
From Step 5 of Algorithm 1, we have and . Using (12), it is easy to obtain . This shows that , and is a decreasing sequence and bounded. Hence, it has at least a convergent subsequence. Moreover, from (12), we also have
[TABLE]
Summing up the inequality (17) from to and using , we obtain
[TABLE]
This implies
[TABLE]
which leads to (15). Similarly, for any one has
[TABLE]
Note that the sequence has a convergent subsequence, thus passing to the limit as in (18) we obtain the first limit of (16). Since due to Statement (b) of Lemma 2.1, the first limit of (16) also implies the second one. If the sequence is bounded, by passing to the limit through a subsequence and combining with Lemma 2.1, we easily prove that every limit point is a stationary point of (3). If the set of limit points is finite, by applying the result in [Ostrowski, 1966][Chapt. 28], we obtain the proof of the remaining conclusion. ∎
Theorem 2.1 provides a global convergence result for Algorithm 1. Moreover, our algorithm requires solving convex subproblems at each iteration, thus offering a great advantage over classical penalty-type schemes. Since the underlying problem is non-convex, the iterates of our algorithm may get trapped at points that may be infeasible for the original problem. That is, under the stated conditions, the iterate sequence may converge to a local minimum (stationary) point of (3). Since , if , then is also a local minimum (stationary) point of the original problem (1).
We can sometimes overcome this by combining the algorithm with a run-and-inspect procedure [Chen et al., 2019], whereby if violates , then we restart the algorithm at a new starting point. More precisely, we add an inspect phase to our existing algorithm that helps escape from non-feasible stationary points. In the inspection phase, if we sample a point around the current point and increase the parameter . Since we do not know any optimal Lagrange multiplier of (1), we cannot guarantee that . However, since it is expected that the multiplier of the subproblem (7) converges to , we can use to monitor the update of by guaranteeing that . We have seen that such strategy performs well on a set of realistic non-convex AC-OPF problems. Nevertheless, in this paper, we do not have theoretical guarantee for this variant and leave this extension for future work.
2.4 Local convergence analysis
Let us study a special case of (3) where Algorithm 1 has a local quadratic convergence rate. Our result relies on the following assumptions. First, for simplicity of our analysis, we assume that is fixed and is also fixed at for in Algorithm 1. Next, let be a stationary point of (3) such that
[TABLE]
where is a given constant independent of and is a neighborhood of . The condition (19) is rather technical, but it holds in the following case. Let us assume that the Jacobian of at satisfies the following condition
[TABLE]
where is the positive smallest singular value of . This condition is similar to the strong second-order sufficient optimality condition [Nocedal and Wright, 2006], but only limited to the linear objective function. In this case, we have . Therefore, it leads to
[TABLE]
For sufficiently large such that , we have , and the condition (19) holds. Now, we prove a local quadratic convergence of Algorithm 1 under assumption (19).
Note that a fast local convergence rate such as superlinear or quadratic is usually expected in Gauss-Newton methods, see, e.g., [Kelley, 1999, Theorem 2.4.1.]. The following theorem shows that Algorithm 1 can also achieve a fast local quadratic convergence rate under a more restrictive condition (19).
Theorem 2.2**.**
Let be the sequence generated by Algorithm 1 such that it converges to a feasible stationary point of (3). Assume further that satisfies condition (19) for some . Then, if such that , then and
[TABLE]
As a consequence, the sequence locally converges to at a quadratic rate.
Proof.
Let . First, by the Lipschitz continuity of , we have . In this case, from this estimate and (8), for any , we can derive that
[TABLE]
This estimate together with and imply that
[TABLE]
Moreover, by -convexity of , we have . Using these last two estimates and the definition of , we can show that
[TABLE]
Finally, using condition (19), we obtain from the last estimate that
[TABLE]
Rearranging this inequality given that , we obtain (20). From (20), we can see that if , then . Moreover, . Hence, if , then . The last statement is a direct consequence of (20). ∎
3 A Case Study: The Optimal Power Flow Problem
In this section, we present the optimal power flow problem and its reformulation in a form that obeys the structure presented in the previous section. We then perform numerical experiments to validate the GN algorithm and we compare performance with IPOPT.
3.1 Problem settings
Original AC-OPF.
Consider a directed electric power network with a set of nodes and a set of branches . The network consists of a set of generators, with denoting the set of generators at bus . Denote as the system admittance matrix, being the conductance and the susceptance () [Taylor, 2015]. The decision variables of AC-OPF are the voltage magnitudes , phase angles , and the real and reactive power outputs of generators, which are denoted as and . We will consider a fixed real and reactive power demand at every node , which we denote as and , respectively. The constraints of the AC-OPF problem can be described by equations (21)-(25), see [Kocuk et al., 2016].
[TABLE]
Constraints (21) and (22) correspond to the real and reactive power balance equations of node . Constraints (23) and (24) impose complex power flow limits on each line, which we indicate by a parameter matrix . Constraints (25) impose bounds on voltage magnitudes (indicated by parameter vectors and ), bounds on real power magnitudes (indicated by parameter vectors and ), bounds on reactive power magnitudes (indicated by parameter vectors and ), and bounds on voltage phase angles differences (indicated by and for each line ). Note that the power balance equality constraints (21), (22) as well as the inequality constraints (23), (24) are non-convex with respect to and .
We will consider the objective of minimizing real power generation costs. Hence, we consider a convex quadratic objective function :
[TABLE]
where and are given coefficients of the cost function. The AC OPF problem then reads as the following non-convex optimization problem:
[TABLE]
Quadratic reformulation.
The starting point of our proposed GN method for solving problem is the quadratic reformulation of AC-OPF [Expósito and Ramos, 1999]. In this reformulation, we use a new set of variables and for replacing the voltage magnitudes and angles . These new variables are defined for all and as:
[TABLE]
where we will denote the vectors and as the collection of the and variables, respectively.
Assuming and (which is common practice), the mapping from to defined by (26), (27) can be inverted as follows:
[TABLE]
thereby defining a bijection in and .
The set of and that define this bijection is further equivalent to the following set of non-linear non-convex constraints in , see [Kocuk et al., 2016]:
[TABLE]
Now, we will substitute the voltage magnitude variables into the problem , and consider the problem on the variables . This reformulation has been commonly employed in the literature in order to arrive at an SOCP (Second-Order Cone Programming) relaxation of the problem [Expósito and Ramos, 1999, Jabr, 2008]. Through numerical experiments, we demonstrate that this reformulation results in highly effective starting points for our algorithm based on the SOCP relaxation of the AC-OPF. Moreover, the reformulation preserves the power balance constraints in linear form. Thus, the power balance constraints are not penalized in our scheme, which implies that they are respected at every iteration of the algorithm. For all these reasons, we pursue the quadratic reformulation of the present section, despite the fact that it requires the introduction of the new variables and .
Concretely, the AC power balance constraints (21) and (22) are linear in and :
[TABLE]
Similarly, the power flow limit constraints (23) and (24) are convex quadratic in and :
[TABLE]
The box constraints (25) are reformulated as follows:
[TABLE]
As a result, an equivalent formulation for the AC-OPF model is:
[TABLE]
With this reformulation, the decision variables are . Constraints (30)-(34) define a convex set. We also have two non-convex equality constraints:
- •
Constraints (28): . We will refer to them as quadratic constraints.
- •
Constraints (29): . We will refer to them as trigonometric constraints.
Now, AC-OPF is written in the format of (1), where , , and:
[TABLE]
Moreover, since is the collection of (28) and (29), we show in the next lemma that it is differentiable and that its Jacobian is Lipschitz continuous.
Lemma 3.1**.**
For the AC-OPF problem, defined by (28)–(29), is smooth, and its Jacobian is Lipschitz continuous with a Lipschitz constant , i.e. for all , where
[TABLE]
The proof of Lemma 3.1 is shown in Appendix A.
We can now derive AC-OPF subproblems for GN. As a reminder, the subproblems are in the format of (8). Since (30), (31) and (34) are linear constraints and (32) and (33) are second-order cone constraints, , in the AC-OPF case, is a set of second-order cone constraints. By applying the classical substitution of the norm-1 terms in the objective, the objective of (8) is (convex) quadratic. We typically refer to an SOCP as an optimization problem with second-order cone constraints and a linear objective. As a consequence, the AC-OPF subproblem is a Quadratically Constrained Program (QCP). We are now in a position to apply GN to AC-OPF.
3.2 A practical implementation of the GN algorithm for AC-OPF
In this section, the goal is to optimize the settings of the GN method. We will demonstrate that the choice of and is crucial. This will allow us to derive a practical version of the GN algorithm, which we compare to IPOPT.
Stopping criteria.
We terminate Algorithm 1 in two occasions, which have been validated through experimental results:
- •
If the maximum number of iterations has been reached.
- •
If the quadratic and trigonometric constraints are satisfied with a tolerance of , where . Concretely, we stop Algorithm 1 if
[TABLE]
If the difference ( in the numerical experiment), then Algorithm 1 has reached an approximate stationary point of the exact penalized formulation (3) (see Lemma 2.1(a)). In this case, the last iterate might not be feasible for . We then use the run-and-inspect strategy [Chen et al., 2019]: the last iterate becomes the starting point of GN and is doubled.
SOCP relaxation for initialization.
As mentioned previously, the quadratic formulation is also used to derive the SOCP relaxation. In the SOCP relaxation, the angles are not modeled. Consequently, the trigonometric constraints (29) are removed. Moreover, the non-convex constraints (28) are relaxed:
[TABLE]
Then, is defined such that:
[TABLE]
Solving this relaxation will provide a partial initial point . In order to improve this initial point, we also compute angles by solving the following optimization problem, where the goal is to minimize the error on the trigonometric constraints:
[TABLE]
This initialization is used for the experimental results on the MATPOWER benchmark.
Parameter tuning strategies.
The convergence theory presented above does not require tuning the and parameters. In practice, tuning is crucial for improving the performance of algorithms for constrained non-convex optimization, including Algorithm 1.
Several observations allow us to decrease the number of iterations that are required for convergence: (i) according to Proposition 2.2, large values of ensure the equivalence between (1) and (3); (ii) quadratic constraints and trigonometric constraints scale up differently; (iii) a careful updating of influences the number of times that subproblem (8) is solved. These observations guide a detailed experimental investigation concerning the choices of and parameters, which is discussed in Appendix B.
Accelerating the computation of subproblem solutions, and warmstart.
The most challenging constraints in the subproblems of GN are the line limit constraints (32) and (33). All the constraints are linear except for (32) and (33) which are quadratic convex. From experimental observations, it is very uncommon that the entire network is congested and that all line limits are binding. One way to benefit from this observation is the following: assuming is the iterate of the algorithm and is binding for a set , a next guess will first be computed by only enforcing (32) and (33) for . This approach of adding lazy network constraints to the model is applied both in theory [Aravena Solís, 2018] as well as in practice by system operators.
Moreover, we observe that the subproblems (8) are based on the same formulation, and only differ by slight changes of certain parameters along the iterations. This motivates us to warm-start the subproblem (8) with a previous primal-dual iterate. In other words, we initialize the solver for solving the subproblem (8) at the -th iteration at the final solution and its corresponding multiplier obtained from the previous iteration . Warm-starting is indeed a key step in iterative methods, including our GN scheme, and will be further analyzed in Section 3.3.2.
3.3 Numerical experiments
In order to validate the proposed GN algorithm, our numerical experiments are conducted in 2 steps: first, we launch simulations on several test cases of a classical library (MATPOWER) and compare the GN algorithm with a state-of-the-art non-convex solver (IPOPT); second, we show the potential benefit of warm-start for our approach.
3.3.1 Illustration on MATPOWER instances
We use the MATPOWER [Zimmerman et al., 2010] library to have access to a wide range of AC-OPF test systems that have been investigated in the literature. We test our approach on instances whose size ranges between 1,354 and 25,000 nodes (1354pegase has 11,192 variables and 27,911 constraints while ACTIVS25k has 186,021 variables and 431,222 constraints).
We benchmark our approach against IPOPT, a non-linear solver based on the interior-point method. To do so, we make use of PowerModels.jl [Coffrin et al., 2018], a Julia package that can be used to solve AC-OPF instances of different libraries with different formulations. In order to make a fair comparison, we initialize GN and IPOPT using the SOCP solution.
Since the subproblem (7) has a non-smooth convex objective due to the -norm penalty, we reformulate it equivalently to a quadratically constrained program (QCP). We solve this QCP with Gurobi [Gurobi Optimization, 2019]. Note that the time needed to solve the SOCP solution is not reported for two reasons. First, we use the SOCP solution for both GN and IPOPT as a starting point to make a fair comparison, so it will not change the gap between the execution times of the two methods. Second, computing the SOCP solution is negligible compared to the methods tested (always less than 5%).
The results of our analysis are presented in Table 1. In Table 1, from left to right, we report the name of the test case. In the Gauss-Newton part, we report the number of iterations, the number of times run-an-inspect is executed, the objective value, the maximum constraint violation and the execution time (in seconds). In the IPOPT part, we report the objective value, the maximum constraint violation and the execution time (in seconds). The last column provides the gap between the GN solution and the IPOPT solution ().
The first notable observation is that GN finds a stationary point (i.e. feasible) of the original AC-OPF problem for all 23 test cases. The stationary point obtained by GN attains the same objective function value as the one returned by IPOPT for most instances (a difference of 0.05% in the objective value may be attributed to numerical precision) and the proposed method outperforms IPOPT in some instances (e.g. 2737sop).
Our experiments further demonstrate that the GN method consistently requires a small number of iterations (less than 20) in a wide range of instances. This is critically important to further accelerate the performance of our method if we appropriately exploit warm-start strategies and efficient solvers for the strongly convex subproblem.
In terms of computational time, the performance is shared between the two approaches. Nevertheless, some instances reveal limitations of the GN algorithm, compared to IPOPT (6495rte, 9241pegase and ACTIVSg25k for example): when the solution of a subproblem becomes time-consuming because of the size of the subproblem, GN might require a larger execution time. We use Gurobi for solving subproblem (8), because it is one of the most stable QCP solvers that are available.
Unfortunately, Gurobi (and IPMs for QCPs in general) does not support warm-start, which would have significantly decreased the computational time. One alternative is to use an Alternating Direction Method of Multipliers (ADMM) solver that supports warm-start. There is no additional cost in implementing warm-start. Indeed, we simply use the solution of the previous iteration as a starting point for the next one. However, ADMM solvers are not mature enough to test large-scale problems. Implementing an efficient subsolver is out of the scope of this work, however we are able to analyze the effect of warm start on these solvers (and by extension to our proposed algorithm), which is the subject of the next section.
3.3.2 The effect of warm-starting strategy
We consider using OSQP [Stellato et al., 2017] as an ADMM solver. Since OSQP only solves quadratic programs (QPs), and since we are only aiming at illustrating the potential of warm-start, we will drop the line limit constraints (32) and (33), in order to satisfy the requirements of the solver. We consider small test cases (39_epri and 118_ieee) since we observed numerical instability for larger test cases.
The results are presented in Table 2. From left to right, # It reports the number of GN iterations, # ADMM It reports the total number of ADMM iterations performed during the GN execution, and Time reports the sum of OSQP solve times along the iterations in seconds.
For both cases, we observe that warm start divides the total number of ADMM iterations as well as solve time by more than a factor of 2, and almost by a factor of 3 for 118_ieee.
We also examine each GN iteration individually, and highlight the impact of warm-start on the number of ADMM iterations in Figure 1. Note that we warm-start dual and primal variables only after iteration 1. Warm-start decreases substantially the number of ADMM iterations in two cases:
When is updated. Indeed, updating only results in slightly changing the objective function. One expects the previous iterates to provide a good warm-start. This is confirmed in Figure 1, where we observe that the number of ADMM iterations is divided by at least a factor of 2 every time is updated. 2. 2.
When the last iterates are computed. Intuitively, one does not expects iterates to change substantially when approaching the optimal solution. This intuition is confirmed by Figure 1. For the particular case of the last iterate, for 39_epri (resp. 118_ieee), the required number of ADMM iterations is less than 20% (resp. 30%).
This investigation suggests that, with a mature ADMM solver, warm-starting is a promising feature for improving the performance of GN on large test cases.
3.4 Optimization with bilinear matrix inequality constraint
The goal of this subsection is to demonstrate that Algorithm 1 can solve a more general class of problems than classical methods such as IPMs or SQPs. For this purpose, we consider the following optimization problem involving bilinear matrix inequality (BMI) constraints:
[TABLE]
where , and are optimization variables, , , and are given input matrices, and (resp. ) means that is negative semidefinite (resp., positive semidefinite). This problem arises from controller design, and is known as the spectral abscissa problem, see, e.g., [Burke et al., 2002].
If we introduce , , and \Omega:=\big{\{}\boldsymbol{x}:=(S,P,F,t)\mid S=S^{\top},\ P=P^{\top},\ S\succeq 0,\ P\succeq 0\big{\}}, then we can reformulate (40) into (1) with and . Clearly, if we explicitly write in the following form
[TABLE]
then we obtain a nonsmooth problem, where is the smallest eigenvalue of , which is nonsmooth [Burke et al., 2002]. In this case, IPMs and SQPs are not applicable for solving (40). However, Algorithm 1 can still solve (40) and its theoretical guarantees are preserved.
We test our method by using the following data:
[TABLE]
[TABLE]
As starting point, we use:
[TABLE]
and we choose . These choices are motivated by the fact that a large value of ensures convergence, see Proposition 2.2. We apply the same update strategy for as in the AC-OPF implementation.
Our algorithm converges in 14 iterations to a feasible local optimum. The constraints are satisfied with a tolerance of .
4 Conclusion
We propose a novel Gauss-Newton algorithm for solving a general class of optimization problems with non-convex constraints. We utilize an exact non-smooth penalty reformulation of the original problem and suggest an iterative scheme for solving this penalized problem which relies on the non-squared Gauss-Newton method. The subproblems of our proposed scheme are strongly convex programs, which can be efficiently solved by numerous third-party convex optimization solvers. We establish a best-known global convergence rate for our method to a stationary point. Under more restrictive conditions, we derive a local quadratic convergence rate for our scheme.
We apply our proposed approach to solve the optimal power flow problem, which is a fundamental and ubiquitous problem in power systems engineering. We apply our proposed algorithm to a reformulation of the AC-OPF, and we propose numerous strategies for tuning our proposed Gauss-Newton scheme, initializing the algorithm, and warm-starting the resolution of the subproblems that are treated by our proposed method. We perform extensive numerical experiments on a large set of instances from the MATPOWER library, and demonstrate the competitive performance of our method to IPOPT, which is a state of the art non-linear non-convex solver. This analysis validates the theoretical analysis of our proposed GN scheme, and proves its effectiveness in practical applications. Future work aims at broadening the scope of such applications beyond power systems.
Acknowledgments
I. Mezghani and A. Papavasiliou acknowledge the financial support of ENGIE-Electrabel. I. Necoara would like to acknowledge the support from the Executive Agency for Higher Education, Research and Innovation Funding (UEFISCDI), Romania, PNIII-P4-PCE-2016-0731, project ScaleFreeNet, no. 39/2017. The work of Quoc Tran-Dinh was partly supported by NSF (DMS-1619884).
Appendix
This appendix provides the full proof of technical results in the main text, the detailed implementation of our algorithm, and additional numerical experiments.
Appendix A The proof of Lemma 3.1
Proof.
From the definition of , it consists of two parts: quadratic forms in and trigonometric and linear forms in and , respectively. We can write it as . Each function in has the form , as shown by (28), and each function in has the form , as shown in (29). We can show that the second derivative of each component of w.r.t. and of w.r.t. , respectively is
[TABLE]
The second derivative is constant. Hence, the maximum eigenvalue of is
. For any , we can easily estimate that
[TABLE]
Therefore, , where the last inequality follows from (25). Consequently,
[TABLE]
which is (38). ∎
Appendix B Investigation on parameter tuning strategies
In order to explain how the parameters of the GN algorithm should be tuned, we first express the subproblem at iteration for the special case of AC-OPF.
Subproblem (8) in the context of AC-OPF.
We define the following functions in order to keep our notation compact:
[TABLE]
Note that we are facing two types of constraints (quadratic and trigonometric) and that they may attain different relative scales for different instances, as we have observed in our numerical experiments. We will define different penalty terms depending on the type of constraint, which will increase the flexibility of our implementation, while respecting the theoretical assumptions that we present in the main body of the paper. To this end, we denote by (resp. ) the penalty parameter associated with the quadratic (resp. trigonometric) constraints. The trigonometric constraints depend on , and , whereas the quadratic ones only depend on and . Therefore, we define two separate regularization parameters: and (we drop the index from now on for notational simplicity).
The GN subproblem at iteration , corresponding to (7), is convex and has the form:
[TABLE]
The optimal solution of , which we denote by , provides the next iterate .
B.1 Joint values of parameters: and
In our first set of tests, we implement the basic variant of Algorithm 1 by retaining the configuration of parameters from our theoretical results. Since we only aim at validating the algorithm, we focus on the three instances that have less than 2,000 nodes: 1354pegase, 1888rte, and 1951pegase.
For the penalty parameters, we first choose the same value for both quadratic and trigonometric constraints as . We perform tests with three choices of , based on the number of lines in the test case: , or . For the regularization parameter , we also choose the same value . Note that all the experiments are initialized from the SOCP solution. We consider three different strategies when applying Algorithm 1:
- •
Fixed strategy: we fix at the upper bound , which is computed in (38).
- •
Bisection update: At each iteration of Algorithm 1, we choose and initialize . If we do not satisfy the line-search condition , we apply a bisection in the interval until we satisfy this condition.
- •
Geometric- update: At each iteration , we also choose and initialize . We then update for in order to guarantee that . In our experiments, we use .
The results of our first test are presented in Table 3. The three columns for each strategy present the number of iterations, and the maximum violation of the quadratic (resp. trigonometric) constraint for each strategy on the three problem instances. In terms of number of iterations, we can observe that fixing results in a poor performance and tends to increase the number of iterations as well as the final violation of the constraints. Given this poor performance, we do not consider the Fixed strategy for the remainder of the numerical experiments. Also, using produces satisfactory results, and increasing the value of supports convergence, therefore we choose this order of magnitude for the value of .
Bisection and Geometric-2 exhibit a similar behavior: when the algorithm converges to a feasible point, both choices achieve converge in tens of iterations, depending on the test case and the choice of . Nevertheless, when failing, the maximum violation of the quadratic constraint (MVQ) never reaches the desired tolerance of . This behavior might suggest that we do not penalize sufficiently the quadratic constraint. One should notice that the quadratic and trigonometric constraints are linked: once the angles are fixed, and , which are the variables that appear in the quadratic constraints, struggle to move from their current value in order to satisfy the quadratic constraints. This motivates us to consider two different values for and : for the quadratic constraints and for the associated variables and ; for the trigonometric constraints and for the angle variables .
B.2 Adaptation and enhancement of Algorithm 1 for AC-OPF
Adapting Algorithm 1 with individual choices of parameters: , and , .
Based on our observations, we choose different values for and . Since we empirically observe that the quadratic constraints are harder to satisfy than the trigonometric constraints, we consider two alternatives: and . Furthermore, we set and . Note that the choice of different values for these parameters does not affect the theoretical guarantees of our algorithm, as long as they satisfy our given conditions.
Also, since we have different values and , we must adapt the condition under which and are updated. From the theory, is updated (through the Bisection or Geometric strategy) if the following condition is not met:
[TABLE]
where .
We adapt this condition to the specific type of constraint. Concretely:
- •
If
[TABLE]
then update and .
- •
If (41) does not hold and
[TABLE]
then only update .
Refining the updates.
We emphasize that, in Algorithm 1, each time that the values of and/or are updated, the subproblem is resolved. Therefore, an effective strategy for updating these parameters can lead to significant improvements in computational time. We mitigate this heavy computational requirement by introducing resolution techniques that are guided by both our theoretical results and empirical observations. Concretely, we propose the following two improvements to the practical implementation of the algorithm, in order to limit the number of computationally expensive updates:
Keeping the value of from one iteration to another is a better strategy than having at the beginning of each iteration. The natural justification for this is that we expect steps to decrease along the iterations. 2. 2.
Checking conditions (41) and (42) can require a large number of updates. Instead, we propose a verification of whether the violation of the constraints is decreasing before checking (41) and (42), which is a less stringent requirement that still yields satisfactory results in terms of constraint violations. Concretely, at iteration , we compute the and norms of and . If these quantities decrease from to , we move to iteration .
By applying these two approaches, we obtain the results that are presented in Table 4. In Table 4, # It provides the number of GN iterations, # L-It provides the number of additional subproblems solved because of an -update (then # It+# L-It gives the total number of subproblems solved) and Obj records the objective value returned. From Table 4, we observe that this new strategy leads to convergence for all three test cases. Even if Bisection seems to require less iterations, it also provides higher objective values than Geometric-2. Also, we do not observe notable differences between applying a factor of 2 or 5 on , although a factor of decreases slightly the number of iterations. In our implementation, we employ the following settings: Geometric-2, and .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[Aravena Solís, 2018] Aravena Solís, I. A. (2018). Analysis of renewable energy integration in transmission-constrained electricity markets using parallel computing . Ph D thesis, UCL-Université Catholique de Louvain.
- 2[Bolte and Pauwels, 2016] Bolte, J. and Pauwels, E. (2016). Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs. Math. Oper. Res. , 41(2):442–465.
- 3[Bolte et al., 2014] Bolte, J., Sabach, S., and Teboulle, M. (2014). Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming , 146(1-2):459–494.
- 4[Boob et al., 2019] Boob, D., Deng, Q., and Lan, G. (2019). Proximal point methods for optimization with nonconvex functional constraints. ar Xiv preprint ar Xiv:1908.02734 .
- 5[Boyd et al., 2011] Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning , 3(1):1–122.
- 6[Burke et al., 2002] Burke, J., Lewis, A., and Overton, M. (2002). Two numerical methods for optimizing matrix stability. Linear Algebra and Its Applications , 351-352:117–145.
- 7[Cain et al., 2012] Cain, M. B., O’neill, R. P., and Castillo, A. (2012). History of optimal power flow and formulations. Federal Energy Regulatory Commission , 1:1–36.
- 8[Carpentier, 1962] Carpentier, J. (1962). Contribution a l’etude du dispatching economique. Bulletin de la Societe Francaise des Electriciens , 3(1):431–447.
