A Globally Convergent Penalty-Based Gauss-Newton Algorithm with   Applications

Ilyes Mezghani; Quoc Tran-Dinh; Ion Necoara; Anthony Papavasiliou

arXiv:1905.08588·math.OC·December 8, 2020

A Globally Convergent Penalty-Based Gauss-Newton Algorithm with Applications

Ilyes Mezghani, Quoc Tran-Dinh, Ion Necoara, Anthony Papavasiliou

PDF

Open Access

TL;DR

This paper introduces a globally convergent Gauss-Newton algorithm for non-convex, non-smooth optimization problems, demonstrating its effectiveness on power flow problems and achieving comparable performance to established solvers.

Contribution

The paper develops a novel penalty-based Gauss-Newton algorithm with proven global and local convergence properties for non-smooth optimization problems.

Findings

01

Global convergence to stationary points from any initial point

02

Local quadratic convergence under certain conditions

03

Comparable performance to IPOPT on power flow problems

Abstract

We propose a globally convergent Gauss-Newton algorithm for finding a local optimal solution of a non-convex and possibly non-smooth optimization problem. The algorithm that we present is based on a Gauss-Newton-type iteration for the non-smooth penalized formulation of the original problem. We establish a global convergence rate for this scheme from any initial point to a stationary point of the problem while using an exact penalty formulation. Under some more restrictive conditions we also derive local quadratic convergence for this scheme. We apply our proposed algorithm to solve the Alternating Current optimal power flow problem on meshed electricity networks, which is a fundamental application in power systems engineering. We verify the performance of the proposed method by showing comparable behavior with IPOPT, a well-established solver. We perform our validation on several…

Tables4

Table 1. Table 1: Comparison of the GN algorithm against IPOPT with SOCP initialization.

	Gauss-Newton					IPOPT
Test Case	# It	# RI	Objective	MCV	Time	Objective	MCV	Time	Gap
1354pegase	13	0	$7.407 e^{4}$	$4.0 e^{- 6}$	15.1 s	$7.407 e^{4}$	$1.3 e^{- 7}$	6.00 s	0.00 %
1888rte	14	0	$5.981 e^{4}$	$8.0 e^{- 6}$	25.9 s	$5.980 𝐞^{𝟒}$	$1.4 e^{- 6}$	59.0 s	0.02 %
1951rte	4	0	$8.174 e^{4}$	$6.9 e^{- 6}$	6.94 s	$8.174 e^{4}$	$3.9 e^{- 6}$	7.67 s	0.00 %
ACTIVSg2000	5	0	$1.229 e^{6}$	$7.7 e^{- 6}$	8.39 s	$1.229 e^{6}$	$3.6 e^{- 7}$	14.8 s	0.01 %
2383wp	19	2	$1.868 e^{6}$	$1.7 e^{- 6}$	51.0 s	$1.868 e^{6}$	$1.1 e^{- 9}$	21.6 s	0.02 %
2736sp	4	0	$1.307 𝐞^{𝟔}$	$9.0 e^{- 6}$	10.9 s	$1.308 e^{6}$	$8.5 e^{- 10}$	13.3 s	-0.06 %
2737sop	3	0	$7.767 𝐞^{𝟓}$	$9.2 e^{- 6}$	7.77 s	$7.778 e^{5}$	$6.9 e^{- 10}$	9.98 s	-0.14 %
2746wop	17	2	$1.208 e^{6}$	$6.0 e^{- 6}$	38.1 s	$1.208 e^{6}$	$2.6 e^{- 9}$	12.8 s	0.00 %
2746wp	3	0	$1.631 𝐞^{𝟔}$	$7.7 e^{- 6}$	6.89 s	$1.632 e^{6}$	$5.9 e^{- 9}$	16.0 s	-0.05 %
2848rte	17	2	$5.303 e^{4}$	$7.2 e^{- 6}$	52.0 s	$5.302 𝐞^{𝟒}$	$3.4 e^{- 7}$	74.3 s	0.01 %
2868rte	4	0	$7.980 e^{4}$	$6.2 e^{- 6}$	9.11 s	$7.979 𝐞^{𝟒}$	$3.9 e^{- 6}$	29.3 s	0.00 %
2869pegase	12	0	$1.340 e^{5}$	$9.8 e^{- 6}$	41.9 s	$1.340 e^{5}$	$2.5 e^{- 8}$	15.5 s	0.00 %
3012wp	8	0	$2.593 e^{6}$	$2.2 e^{- 6}$	25.2 s	$2.592 𝐞^{𝟔}$	$9.4 e^{- 10}$	18.8 s	0.03 %
3120sp	13	0	$2.142 𝐞^{𝟔}$	$8.1 e^{- 6}$	41.2 s	$2.143 e^{6}$	$7.3 e^{- 9}$	19.2 s	-0.04 %
3375wp	8	0	$7.413 e^{6}$	$8.7 e^{- 6}$	34.7 s	$7.412 𝐞^{𝟔}$	$3.0 e^{- 8}$	21.3 s	0.02 %
6468rte	19	2	$8.685 e^{4}$	$9.4 e^{- 6}$	187 s	$8.683 𝐞^{𝟒}$	$6.6 e^{- 7}$	103 s	0.02 %
6470rte	11	0	$9.835 e^{4}$	$4.9 e^{- 6}$	89.3 s	$9.835 e^{4}$	$3.8 e^{- 6}$	144 s	0.01 %
6495rte	12	0	$1.063 e^{5}$	$8.4 e^{- 6}$	180 s	$1.063 e^{5}$	$4.6 e^{- 8}$	52.4 s	0.04 %
6515rte	18	0	$1.098 e^{5}$	$3.2 e^{- 6}$	204 s	$1.098 e^{5}$	$5.0 e^{- 7}$	86.9 s	0.03 %
9241pegase	17	0	$3.167 e^{6}$	$9.2 e^{- 6}$	894 s	$3.159 𝐞^{𝟔}$	$1.2 e^{- 6}$	368 s	0.25 %
ACTIVSg10k	6	0	$2.488 e^{6}$	$6.6 e^{- 6}$	117 s	$2.486 𝐞^{𝟔}$	$6.8 e^{- 8}$	93.4 s	0.10 %
13659pegase	19	0	$3.885 e^{5}$	$8.5 e^{- 6}$	137 s	$3.861 𝐞^{𝟓}$	$4.4 e^{- 7}$	685 s	0.62 %
ACTIVSg25k	16	0	$6.033 e^{6}$	$9.9 e^{- 6}$	1,740 s	$6.018 𝐞^{𝟔}$	$7.8 e^{- 8}$	544 s	0.25 %

Table 2. Table 2: Results without and with warm-start on 2 small instances using OSQP solver.

	No warm-start			With warm-start
Test Case	# It	# ADMM It	Time	# It	# ADMM It	Time
39_epri	4	114,735	4.65 s	4	56,988	2.15 s
118_ieee	4	162,251	29.2 s	4	65,832	11.6 s

Table 3. Table 3: Performance results of the Gauss-Newton algorithm with different strategies for β 𝛽 \beta and L 𝐿 L .

		Fixed			Bisection			Geometric-2
Test Case	$β$	# It	MVQ	MVT	# It	MVQ	MVT	# It	MVQ	MVT
1354pegase	$\| ℒ \| / 100$	100	$6 e^{- 3}$	$4 e^{- 6}$	100	$6 e^{- 3}$	$4 e^{- 6}$	100	$7 e^{- 3}$	$4 e^{- 6}$
	$\| ℒ \| / 10$	91	$2 e^{- 6}$	$6 e^{- 9}$	3	$5 e^{- 6}$	$2 e^{- 7}$	3	$5 e^{- 6}$	$4 e^{- 6}$
	$\| ℒ \|$	78	$2 e^{- 6}$	$1 e^{- 8}$	3	$8 e^{- 9}$	$4 e^{- 9}$	3	$6 e^{- 6}$	$5 e^{- 6}$
1888rte	$\| ℒ \| / 100$	100	$1 e^{- 2}$	$8 e^{- 8}$	21	$1 e^{- 2}$	$9 e^{- 8}$	100	$7 e^{- 3}$	$8 e^{- 7}$
	$\| ℒ \| / 10$	100	$8 e^{- 3}$	$5 e^{- 8}$	11	$7 e^{- 3}$	$7 e^{- 8}$	100	$5 e^{- 3}$	$3 e^{- 8}$
	$\| ℒ \|$	100	$8 e^{- 3}$	$5 e^{- 8}$	15	$5 e^{- 3}$	$2 e^{- 8}$	100	$3 e^{- 3}$	$8 e^{- 8}$
1951rte	$\| ℒ \| / 100$	86	$4 e^{- 6}$	$4 e^{- 9}$	10	$2 e^{- 3}$	$5 e^{- 8}$	39	$3 e^{- 4}$	$4 e^{- 8}$
	$\| ℒ \| / 10$	66	$6 e^{- 7}$	$3 e^{- 8}$	4	$2 e^{- 5}$	$6 e^{- 8}$	3	$8 e^{- 6}$	$6 e^{- 6}$
	$\| ℒ \|$	61	$1 e^{- 6}$	$1 e^{- 7}$	4	$5 e^{- 7}$	$1 e^{- 7}$	3	$3 e^{- 6}$	$2 e^{- 6}$

Table 4. Table 4: Performance behavior of the Gauss-Newton algorithm with different values and strategies for β q subscript 𝛽 𝑞 \beta_{q} , β t subscript 𝛽 𝑡 \beta_{t} , L c s subscript 𝐿 𝑐 𝑠 L_{cs} , L θ subscript 𝐿 𝜃 L_{\theta} .

			Bisection			Geometric-2
Test Case	$β_{t}$	$β_{q}$	# It	# L-It	Obj	# It	# L-It	Obj
1354pegase	$\| ℒ \|$	$\times 2$	3	1	$7.408 e 4$	14	6	$7.407 e 4$
	$\| ℒ \|$	$\times 5$	4	1	$7.408 e 4$	13	4	$7.407 e 4$
1888rte	$\| ℒ \|$	$\times 2$	8	1	$5.982 e 4$	17	1	$5.981 e 4$
	$\| ℒ \|$	$\times 5$	11	1	$5.982 e 4$	14	3	$5.981 e 4$
1951rte	$\| ℒ \|$	$\times 2$	4	1	$8.174 e 4$	6	1	$8.174 e 4$
	$\| ℒ \|$	$\times 5$	4	0	$8.174 e 4$	4	0	$8.174 e 4$

Equations159

x \in R^{d} min f (x) s.t. Ψ (x) = 0, x \in Ω.

x \in R^{d} min f (x) s.t. Ψ (x) = 0, x \in Ω.

∥ Ψ^{'} (x) - Ψ^{'} (\hat{x}) ∥ \leq L_{Ψ} ∥ x - \hat{x} ∥ \forall x, \hat{x} \in Ω,

∥ Ψ^{'} (x) - Ψ^{'} (\hat{x}) ∥ \leq L_{Ψ} ∥ x - \hat{x} ∥ \forall x, \hat{x} \in Ω,

N_{Ω} (x) := {{w \in R^{d} ∣ w^{⊤} (y - x) \geq 0, \forall y \in Ω}, if x \in Ω \emptyset, otherwise .

N_{Ω} (x) := {{w \in R^{d} ∣ w^{⊤} (y - x) \geq 0, \forall y \in Ω}, if x \in Ω \emptyset, otherwise .

- \nabla f (x^{*}) - Ψ^{'} (x^{*}) y^{*} \in N_{Ω} (x^{*}), x^{*} \in Ω, and Ψ (x^{*}) = 0.

- \nabla f (x^{*}) - Ψ^{'} (x^{*}) y^{*} \in N_{Ω} (x^{*}), x^{*} \in Ω, and Ψ (x^{*}) = 0.

\min_{\boldsymbol{x}\in\Omega}\Big{\{}F(\boldsymbol{x}):=f(\boldsymbol{x})+\beta|\Psi(\boldsymbol{x})|\Big{\}},

\min_{\boldsymbol{x}\in\Omega}\Big{\{}F(\boldsymbol{x}):=f(\boldsymbol{x})+\beta|\Psi(\boldsymbol{x})|\Big{\}},

D F (x^{*}) [d] := \nabla f (x^{*})^{⊤} d + β ξ (x^{*})^{⊤} Ψ^{'} (x^{*})^{⊤} d,

D F (x^{*}) [d] := \nabla f (x^{*})^{⊤} d + β ξ (x^{*})^{⊤} Ψ^{'} (x^{*})^{⊤} d,

0 \in \nabla f (x^{*}) + β Ψ^{'} (x^{*}) \partial ∣Ψ (x^{*}) ∣ + N_{Ω} (x^{*}) .

0 \in \nabla f (x^{*}) + β Ψ^{'} (x^{*}) \partial ∣Ψ (x^{*}) ∣ + N_{Ω} (x^{*}) .

D F (x^{*}) [d] \geq 0, \forall d \in F_{Ω} (x^{*}),

D F (x^{*}) [d] \geq 0, \forall d \in F_{Ω} (x^{*}),

F_{Ω} (x) := {d \in R^{d} ∣ d = t (y - x), \forall y \in Ω, t \geq 0} .

F_{Ω} (x) := {d \in R^{d} ∣ d = t (y - x), \forall y \in Ω, t \geq 0} .

{\!\!\!\!\!\!\!\!\!\!}\begin{array}[]{ll}&\displaystyle\min_{\boldsymbol{x}\in\Omega}\Big{\{}\mathcal{Q}_{L}(\boldsymbol{x};\boldsymbol{x}^{k}):=f(\boldsymbol{x})+\beta|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{x}-\boldsymbol{x}^{k})|+\tfrac{L}{2}\|\boldsymbol{x}-\boldsymbol{x}^{k}\|^{2}\Big{\}},\end{array}{\!\!\!\!}

{\!\!\!\!\!\!\!\!\!\!}\begin{array}[]{ll}&\displaystyle\min_{\boldsymbol{x}\in\Omega}\Big{\{}\mathcal{Q}_{L}(\boldsymbol{x};\boldsymbol{x}^{k}):=f(\boldsymbol{x})+\beta|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{x}-\boldsymbol{x}^{k})|+\tfrac{L}{2}\|\boldsymbol{x}-\boldsymbol{x}^{k}\|^{2}\Big{\}},\end{array}{\!\!\!\!}

{\!\!\!\!}\begin{array}[]{ll}&\boldsymbol{V}_{L}(\boldsymbol{x}^{k}):=\text{arg}\!\displaystyle\min_{x\in\Omega}\Big{\{}\mathcal{Q}_{L}(\boldsymbol{x};\boldsymbol{x}^{k}):=f(\boldsymbol{x})+\beta|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{x}-\boldsymbol{x}^{k})|+\frac{L}{2}\|\boldsymbol{x}-\boldsymbol{x}^{k}\|^{2}\Big{\}}.{\!\!\!\!}\end{array}

{\!\!\!\!}\begin{array}[]{ll}&\boldsymbol{V}_{L}(\boldsymbol{x}^{k}):=\text{arg}\!\displaystyle\min_{x\in\Omega}\Big{\{}\mathcal{Q}_{L}(\boldsymbol{x};\boldsymbol{x}^{k}):=f(\boldsymbol{x})+\beta|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{x}-\boldsymbol{x}^{k})|+\frac{L}{2}\|\boldsymbol{x}-\boldsymbol{x}^{k}\|^{2}\Big{\}}.{\!\!\!\!}\end{array}

{\!\!\!\!\!\!\!}\begin{array}[]{ll}&\left[\nabla{f}(\boldsymbol{V}_{L}(\boldsymbol{x}^{k}))+L(\boldsymbol{V}_{L}(\boldsymbol{x}^{k})-\boldsymbol{x}^{k})+\beta\Psi^{\prime}(\boldsymbol{x}^{k})\xi(\boldsymbol{x}^{k})\right]^{\top}(\hat{\boldsymbol{x}}-\boldsymbol{V}_{L}(\boldsymbol{x}^{k}))\geq 0,~{}~{}\forall\hat{\boldsymbol{x}}\in\Omega,\end{array}{\!\!\!\!}

{\!\!\!\!\!\!\!}\begin{array}[]{ll}&\left[\nabla{f}(\boldsymbol{V}_{L}(\boldsymbol{x}^{k}))+L(\boldsymbol{V}_{L}(\boldsymbol{x}^{k})-\boldsymbol{x}^{k})+\beta\Psi^{\prime}(\boldsymbol{x}^{k})\xi(\boldsymbol{x}^{k})\right]^{\top}(\hat{\boldsymbol{x}}-\boldsymbol{V}_{L}(\boldsymbol{x}^{k}))\geq 0,~{}~{}\forall\hat{\boldsymbol{x}}\in\Omega,\end{array}{\!\!\!\!}

G_{L} (x^{k}) := L (x^{k} - V_{L} (x^{k})), d_{L} (x^{k}) := V_{L} (x^{k}) - x^{k}, and r_{L} (x^{k}) := ∥ d_{L} (x^{k}) ∥.

G_{L} (x^{k}) := L (x^{k} - V_{L} (x^{k})), d_{L} (x^{k}) := V_{L} (x^{k}) - x^{k}, and r_{L} (x^{k}) := ∥ d_{L} (x^{k}) ∥.

F (x^{k}) - Q_{L_{k}} (V_{L_{k}} (x^{k}); x^{k}) \geq \frac{L _{k}}{2} r_{L_{k}}^{2} (x^{k}) .

F (x^{k}) - Q_{L_{k}} (V_{L_{k}} (x^{k}); x^{k}) \geq \frac{L _{k}}{2} r_{L_{k}}^{2} (x^{k}) .

{\!\!\!\!\!\!\!\!\!\!\!}\begin{array}[]{ll}&F(\boldsymbol{x}^{k})-F(\boldsymbol{V}_{L_{k}}(\boldsymbol{x}^{k}))\geq\frac{(2{L_{k}}-\beta L_{\Psi})}{2}r^{2}_{L_{k}}(\boldsymbol{x}^{k})=\frac{(2{L_{k}}-\beta L_{\Psi})}{2L_{k}^{2}}\|\boldsymbol{G}_{L_{k}}(\boldsymbol{x}^{k})\|^{2}.\vspace{1ex}\\ &DF(\boldsymbol{x}^{k})[\boldsymbol{d}_{L_{k}}(\boldsymbol{x}^{k})]\leq-{L_{k}}r^{2}_{L_{k}}(\boldsymbol{x}^{k})=-\frac{1}{{L_{k}}}\|\boldsymbol{G}_{L_{k}}(\boldsymbol{x}^{k})\|^{2}.\end{array}{\!\!\!\!\!\!\!\!}

{\!\!\!\!\!\!\!\!\!\!\!}\begin{array}[]{ll}&F(\boldsymbol{x}^{k})-F(\boldsymbol{V}_{L_{k}}(\boldsymbol{x}^{k}))\geq\frac{(2{L_{k}}-\beta L_{\Psi})}{2}r^{2}_{L_{k}}(\boldsymbol{x}^{k})=\frac{(2{L_{k}}-\beta L_{\Psi})}{2L_{k}^{2}}\|\boldsymbol{G}_{L_{k}}(\boldsymbol{x}^{k})\|^{2}.\vspace{1ex}\\ &DF(\boldsymbol{x}^{k})[\boldsymbol{d}_{L_{k}}(\boldsymbol{x}^{k})]\leq-{L_{k}}r^{2}_{L_{k}}(\boldsymbol{x}^{k})=-\frac{1}{{L_{k}}}\|\boldsymbol{G}_{L_{k}}(\boldsymbol{x}^{k})\|^{2}.\end{array}{\!\!\!\!\!\!\!\!}

F (x^{k}) = η (0) \geq η (t) + η^{'} (t) (0 - t) = η (t) + \frac{1}{2 t} r_{1/ t}^{2} (x^{k}) .

F (x^{k}) = η (0) \geq η (t) + η^{'} (t) (0 - t) = η (t) + \frac{1}{2 t} r_{1/ t}^{2} (x^{k}) .

{\!\!\!\!}\begin{array}[]{ll}\left[\nabla{f}(\boldsymbol{V}_{k})+{L_{k}}(\boldsymbol{V}_{k}-\boldsymbol{x}^{k})+\beta\Psi^{\prime}(\boldsymbol{x}^{k})\xi(\boldsymbol{x}^{k})\right]^{\top}(\boldsymbol{x}-\boldsymbol{V}_{k})\geq 0,\end{array}{\!\!\!\!}

{\!\!\!\!}\begin{array}[]{ll}\left[\nabla{f}(\boldsymbol{V}_{k})+{L_{k}}(\boldsymbol{V}_{k}-\boldsymbol{x}^{k})+\beta\Psi^{\prime}(\boldsymbol{x}^{k})\xi(\boldsymbol{x}^{k})\right]^{\top}(\boldsymbol{x}-\boldsymbol{V}_{k})\geq 0,\end{array}{\!\!\!\!}

\nabla f (V_{k})^{⊤} (x^{k} - V_{k}) + β ξ (x^{k})^{⊤} Ψ^{'} (x^{k})^{⊤} (x^{k} - V_{k}) \geq L_{k} ∥ V_{k} - x^{k} ∥^{2} .

\nabla f (V_{k})^{⊤} (x^{k} - V_{k}) + β ξ (x^{k})^{⊤} Ψ^{'} (x^{k})^{⊤} (x^{k} - V_{k}) \geq L_{k} ∥ V_{k} - x^{k} ∥^{2} .

\begin{array}[]{ll}f(\boldsymbol{x}^{k})&\geq f(\boldsymbol{V}_{k})+\nabla{f}(\boldsymbol{V}_{k})^{\top}(\boldsymbol{x}^{k}-\boldsymbol{V}_{k}).\vspace{1ex}\end{array}

\begin{array}[]{ll}f(\boldsymbol{x}^{k})&\geq f(\boldsymbol{V}_{k})+\nabla{f}(\boldsymbol{V}_{k})^{\top}(\boldsymbol{x}^{k}-\boldsymbol{V}_{k}).\vspace{1ex}\end{array}

\begin{array}[]{ll}|\Psi(\boldsymbol{x}^{k})|&\geq|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{k}-\boldsymbol{x}^{k})|+\xi(\boldsymbol{x}^{k})^{\top}\Psi^{\prime}(\boldsymbol{x}^{k})^{\top}(\boldsymbol{x}^{k}-\boldsymbol{V}_{k}).\end{array}

\begin{array}[]{ll}|\Psi(\boldsymbol{x}^{k})|&\geq|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{k}-\boldsymbol{x}^{k})|+\xi(\boldsymbol{x}^{k})^{\top}\Psi^{\prime}(\boldsymbol{x}^{k})^{\top}(\boldsymbol{x}^{k}-\boldsymbol{V}_{k}).\end{array}

{\!\!\!\!\!\!\!\!}\begin{array}[]{ll}|\Psi(\boldsymbol{V}_{k})|&\leq|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{k}-\boldsymbol{x}^{k})|+|\Psi(\boldsymbol{V}_{k})-\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{k}-\boldsymbol{x}^{k})|\vspace{1ex}\\ &\leq|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{k}-\boldsymbol{x}^{k})|+\frac{L_{\Psi}}{2}\|\boldsymbol{V}_{k}-\boldsymbol{x}^{k}\|^{2}.\end{array}

{\!\!\!\!\!\!\!\!}\begin{array}[]{ll}|\Psi(\boldsymbol{V}_{k})|&\leq|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{k}-\boldsymbol{x}^{k})|+|\Psi(\boldsymbol{V}_{k})-\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{k}-\boldsymbol{x}^{k})|\vspace{1ex}\\ &\leq|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{k}-\boldsymbol{x}^{k})|+\frac{L_{\Psi}}{2}\|\boldsymbol{V}_{k}-\boldsymbol{x}^{k}\|^{2}.\end{array}

{\!\!\!\!\!\!}\begin{array}[]{ll}f(\boldsymbol{x}^{k})+\beta|\Psi(\boldsymbol{x}^{k})|&\geq f(\boldsymbol{V}_{k})+\beta|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{k}-\boldsymbol{x}^{k})|+{L_{k}}\|\boldsymbol{V}_{k}-\boldsymbol{x}^{k}\|^{2}\vspace{1ex}\\ &\geq f(\boldsymbol{V}_{k})+\beta|\Psi(\boldsymbol{V}_{k})|+{L_{k}}\|\boldsymbol{V}_{k}-\boldsymbol{x}^{k}\|^{2}-\frac{\beta L_{\Psi}}{2}\|\boldsymbol{V}_{k}-\boldsymbol{x}^{k}\|^{2},\end{array}

{\!\!\!\!\!\!}\begin{array}[]{ll}f(\boldsymbol{x}^{k})+\beta|\Psi(\boldsymbol{x}^{k})|&\geq f(\boldsymbol{V}_{k})+\beta|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{k}-\boldsymbol{x}^{k})|+{L_{k}}\|\boldsymbol{V}_{k}-\boldsymbol{x}^{k}\|^{2}\vspace{1ex}\\ &\geq f(\boldsymbol{V}_{k})+\beta|\Psi(\boldsymbol{V}_{k})|+{L_{k}}\|\boldsymbol{V}_{k}-\boldsymbol{x}^{k}\|^{2}-\frac{\beta L_{\Psi}}{2}\|\boldsymbol{V}_{k}-\boldsymbol{x}^{k}\|^{2},\end{array}

\begin{array}[]{ll}F(\boldsymbol{x}^{k})&\geq F(\boldsymbol{V}_{k})+\frac{(2{L_{k}}-\beta L_{\Psi})}{2}\|\boldsymbol{V}_{k}-\boldsymbol{x}^{k}\|^{2}.\end{array}

\begin{array}[]{ll}F(\boldsymbol{x}^{k})&\geq F(\boldsymbol{V}_{k})+\frac{(2{L_{k}}-\beta L_{\Psi})}{2}\|\boldsymbol{V}_{k}-\boldsymbol{x}^{k}\|^{2}.\end{array}

D F (x^{k}) [d_{L_{k}} (x^{k})] = \nabla f (V_{k})^{⊤} (V_{k} - x^{k}) + β ξ (x^{k})^{⊤} Ψ^{'} (x^{k})^{⊤} (V_{k} - x^{k}) .

D F (x^{k}) [d_{L_{k}} (x^{k})] = \nabla f (V_{k})^{⊤} (V_{k} - x^{k}) + β ξ (x^{k})^{⊤} Ψ^{'} (x^{k})^{⊤} (V_{k} - x^{k}) .

{\!\!\!\!\!}\begin{array}[]{ll}|\Psi(\boldsymbol{V}_{L}(\boldsymbol{x}^{k}))|{\!\!\!}&\leq|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{L}(\boldsymbol{x}^{k})-\boldsymbol{x}^{k})|+\|\Psi(\boldsymbol{V}_{L}(\boldsymbol{x}^{k}))-\Psi(\boldsymbol{x}^{k})-\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{L}(\boldsymbol{x}^{k})-\boldsymbol{x}^{k})\|\vspace{1ex}\\ &\leq|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{L}(\boldsymbol{x}^{k})-\boldsymbol{x}^{k})|+\frac{L_{\Psi}}{2}\|\boldsymbol{V}_{L}(\boldsymbol{x}^{k})-\boldsymbol{x}^{k}\|^{2}.\end{array}

{\!\!\!\!\!}\begin{array}[]{ll}|\Psi(\boldsymbol{V}_{L}(\boldsymbol{x}^{k}))|{\!\!\!}&\leq|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{L}(\boldsymbol{x}^{k})-\boldsymbol{x}^{k})|+\|\Psi(\boldsymbol{V}_{L}(\boldsymbol{x}^{k}))-\Psi(\boldsymbol{x}^{k})-\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{L}(\boldsymbol{x}^{k})-\boldsymbol{x}^{k})\|\vspace{1ex}\\ &\leq|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{L}(\boldsymbol{x}^{k})-\boldsymbol{x}^{k})|+\frac{L_{\Psi}}{2}\|\boldsymbol{V}_{L}(\boldsymbol{x}^{k})-\boldsymbol{x}^{k}\|^{2}.\end{array}

\begin{array}[]{ll}{\!\!\!}F(\boldsymbol{V}_{L}(\boldsymbol{x}^{k}))\leq\mathcal{Q}_{L}(\boldsymbol{V}_{L}(\boldsymbol{x}^{k})l\boldsymbol{x}^{k})-\frac{L-\beta L_{\Psi}}{2}\|\boldsymbol{V}_{L}(\boldsymbol{x}^{k})-\boldsymbol{x}^{k}\|^{2}.\end{array}{\!\!\!}

\begin{array}[]{ll}{\!\!\!}F(\boldsymbol{V}_{L}(\boldsymbol{x}^{k}))\leq\mathcal{Q}_{L}(\boldsymbol{V}_{L}(\boldsymbol{x}^{k})l\boldsymbol{x}^{k})-\frac{L-\beta L_{\Psi}}{2}\|\boldsymbol{V}_{L}(\boldsymbol{x}^{k})-\boldsymbol{x}^{k}\|^{2}.\end{array}{\!\!\!}

0 \leq k \leq K min ∥ G_{β L_{Ψ}} (x^{k}) ∥^{2} \leq \frac{2 ( β L _{Ψ} ) ^{2}}{L _{m i n} ( K + 1 )} [F (x^{0}) - F^{⋆}],

0 \leq k \leq K min ∥ G_{β L_{Ψ}} (x^{k}) ∥^{2} \leq \frac{2 ( β L _{Ψ} ) ^{2}}{L _{m i n} ( K + 1 )} [F (x^{0}) - F^{⋆}],

k \to \infty lim ∥ x^{k + 1} - x^{k} ∥ = 0, and k \to \infty lim ∥ G_{β L_{Ψ}} (x^{k}) ∥ = 0,

k \to \infty lim ∥ x^{k + 1} - x^{k} ∥ = 0, and k \to \infty lim ∥ G_{β L_{Ψ}} (x^{k}) ∥ = 0,

F (x^{k + 1}) \leq F (x^{k}) - \frac{L _{m i n}}{2} r_{L_{k}}^{2} (x^{k}) \leq F (x^{k}) - \frac{L _{m i n}}{2} r_{β L_{Ψ}}^{2} (x^{k}) .

F (x^{k + 1}) \leq F (x^{k}) - \frac{L _{m i n}}{2} r_{L_{k}}^{2} (x^{k}) \leq F (x^{k}) - \frac{L _{m i n}}{2} r_{β L_{Ψ}}^{2} (x^{k}) .

\frac{L _{m i n}}{2 ( β L _{Ψ} ) ^{2}} k = 0 \sum K ∥ G_{β L_{Ψ}} (x^{k}) ∥^{2} = \frac{L _{m i n}}{2} k = 0 \sum K r_{β L_{Ψ}}^{2} (x^{k}) \leq F (x^{0}) - F (x^{k + 1}) \leq F (x^{0}) - F^{⋆} .

\frac{L _{m i n}}{2 ( β L _{Ψ} ) ^{2}} k = 0 \sum K ∥ G_{β L_{Ψ}} (x^{k}) ∥^{2} = \frac{L _{m i n}}{2} k = 0 \sum K r_{β L_{Ψ}}^{2} (x^{k}) \leq F (x^{0}) - F (x^{k + 1}) \leq F (x^{0}) - F^{⋆} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Numerical Methods and Algorithms · Advanced Optimization Algorithms Research

Full text

A Globally Convergent Penalty-Based Gauss-Newton

Algorithm with Applications

Ilyes Mezghani

Corresponding Author. Center for Operations Research and Econometrics, Université Catholique de Louvain. Voie du Roman Pays 34, 1348 Louvain-la-Neuve, Belgium. E-mail: [email protected]

Quoc Tran-Dinh

Department of Statistics and Operations Research, The University of North Carolina at Chapel Hill. 333 Hanes Hall, Chapel Hill, NC 27599, USA. E-mail: [email protected]

Ion Necoara

Politehnica University of Bucharest, Department of Automatic Control and Systems Engineering. Splaiul Independentei nr. 313, sector 6, 060042 Bucharest, Romania. E-mail: [email protected]

Anthony Papavasiliou

Center for Operations Research and Econometrics, Université Catholique de Louvain. Voie du Roman Pays 34, 1348 Louvain-la-Neuve, Belgium. E-mail: [email protected]

Abstract

We propose a globally convergent Gauss-Newton algorithm for finding a local optimal solution of a non-convex and possibly non-smooth optimization problem. The algorithm that we present is based on a Gauss-Newton-type iteration for the non-smooth penalized formulation of the original problem. We establish a global convergence rate for this scheme from any initial point to a stationary point of the problem while using an exact penalty formulation. Under some more restrictive conditions we also derive local quadratic convergence for this scheme. We apply our proposed algorithm to solve the Alternating Current optimal power flow problem on meshed electricity networks, which is a fundamental application in power systems engineering. We verify the performance of the proposed method by showing comparable behavior with IPOPT, a well-established solver. We perform our validation on several representative instances of the optimal power flow problem, which are sourced from the MATPOWER library.

**Keywords: ** Nonlinear programming, Optimization with non-convex constraints, penalty reformulation, Gauss-Newton method, AC optimal power flow.

1 Introduction

Statement of the problem.

In this paper we are interested in solving the following optimization problem with non-convex constraints:

[TABLE]

For this problem we assume that the objective function $f$ is convex and differentiable, $\Omega$ is a compact convex set, and the non-convexity enters in (1) through the non-linear equality constraints $\Psi(\boldsymbol{x})=0$ , defined by $\Psi:\mathbb{R}^{d}\to\mathbb{R}^{n}$ .

Note that in the literature we can find many efficient algorithms that are able to minimize non-convex objective functions, but with convex constraints, see e.g., [Chen et al., 2019, Bolte et al., 2014, Patrascu and Necoara, 2015, Nocedal and Wright, 2006]. In this paper, we treat a more general optimization model, where the non-convexity enters into the optimization problem through the constraints.

It is well-known that optimization problems with non-convex constraints are more difficult to solve than convex constrained problems [Nocedal and Wright, 2006]. For the non-convex problem (1), classical non-convex optimization algorithms such as interior-point, augmented Lagrangian, penalty, Gauss-Newton, and sequential quadratic programming methods can only aim at finding a stationary point (i.e. a point that satisfies the first-order optimality conditions), which is a candidate for a local minimum [Nocedal and Wright, 2006, Cartis et al., 2011]. Nevertheless, convergence guarantees of these methods rely on the twice continuous differentiability of the underlying functionals, including the representation of $\Omega$ . Our method also approximates a stationary point, but it allows $f$ to include nonsmooth convex terms and $\Omega$ to be a general convex set (see Subsection 3.4).

For an iterative method to identify a stationary point that is a local minimum, but not a saddle-point, more sophisticated techniques are required, such as cubic regularization [Nesterov and Polyak, 2006] or random noise gradient [Dauphin et al., 2014]. However, it is still unclear how to efficiently implement these methods for solving large-scale optimization problems with non-convex constraints. One of the most efficient and well-established non-linear solvers for finding stationary points is IPOPT [Wächter and Biegler, 2006], which relies on a primal-dual interior-point method combined with other advanced techniques. We emphasize that this classical method is only guaranteed to converge to a stationary point, and often requires a strategy such as line-search, filter, or trust-region to achieve global convergence (i.e. the method is still convergent from a starting point that is far from the targeted stationary point) under certain restrictive assumptions. These methods often assume that the underlying functionals in (1) are twice continuously differentiable, including $f$ and $\Omega$ . In addition, a linesearch procedure may require that the gradient of the underlying merit function is Lipschitz continuous to guarantee global convergence, see, e.g., [Nocedal and Wright, 2006, Theorem 3.2.].

A Gauss-Newton algorithm for constrained non-convex optimization.

Recently there has been a revived interest in the design and analysis of algorithms for solving optimization problems involving non-convex constraints, in particular in engineering and machine learning [Boob et al., 2019, Bolte and Pauwels, 2016, Cartis et al., 2014, Curtis et al., 2018, Tran-Dinh et al., 2012]. The main trend is in solving large-scale problems by exploiting special structures/properties of the problem model and data towards the design of simple schemes (e.g., solving a tractable convex subproblem at each iteration), while producing reasonable approximate solutions efficiently [Drusvyatskiy and Paquette, 2019, Lewis and Wright, 2016, Nesterov, 2007b].

Following this trend, in this paper we also devise a provable convergent Gauss-Newton (GN)-type algorithm for solving the non-convex optimization problem (1). The idea of the GN method studied in this paper was proposed in [Lewis and Wright, 2016] for minimizing a compositional model of the form $\phi(F(x))$ , where $\phi$ is possibly nonsmooth. This method targets a different problem class compared to standard GN or Levenberg–Marquardt methods for nonlinear least-squares problems. The main idea is to replace the non-Lipschitz continuous least-squares function $\phi(\cdot)=\frac{1}{2}\|\cdot\|_{2}^{2}$ in these methods by a given convex and Lipschitz continuous function $\phi(\cdot)$ (but possibly nonsmooth). Nesterov contributed a thorough investigation on convergence guarantees of this method in [Nesterov, 2007b] when $\phi$ is a given norm. This was extended to a more general model that can cover exact penalty methods in a technical report [Tran-Dinh and Diehl, 2011]. Very recently, [Drusvyatskiy and Paquette, 2019] revisited this method under a Moreau’s envelope perspective, but only on global convergence guarantees.

Our algorithm converges globally to a stationary point of the problem in the sense that, starting from any initial point within a given level set, the algorithm converges to a stationary point. In addition, the proposed approach is different from standard GN methods in the literature [Nocedal and Wright, 2006, Deuflhard, 2006] due to the use of a non-smooth penalty instead of a classical quadratic penalty term. This allows our algorithm to converge globally [Nesterov, 2007a]. Hence, we refer to this algorithm as a global GN scheme.

The main idea of our method is to keep the convex sub-structure of the original problem unchanged and to convexify the non-convex part by exploiting penalty theory and the GN framework. Hence, in contrast to IPOPT (solving a linear system of the barrier problem to obtain a Newton search direction), each iteration of our algorithm requires finding the solution of a strongly convex subproblem, which can be efficiently solved by many existing convex solvers. Under some more restrictive conditions, we also derive a local quadratic convergence rate for our GN method.

The optimal power flow problem.

We apply our proposed optimization algorithm to the alternating current optimal power flow (AC-OPF) problem [Frank et al., 2012], which lies at the heart of short-term power system operations [Cain et al., 2012]. We show that this problem can be posed in the framework of non-convex optimization with the particular structure on the constraints as in (1). The optimal power flow (OPF) problem [Carpentier, 1962] consists in finding an optimal operating point of a power system while minimizing a certain objective (typically power generation cost), subject to the Kirchhoff’s power flow equations and various network and control operating limits.

In recent years, there has been a great body of literature that has focused on convex relaxations of the AC-OPF problem, including semidefinite programming relaxations [Lavaei and Low, 2011, Kocuk et al., 2015], conic relaxations [Jabr, 2008, Gan et al., 2014, Kocuk et al., 2016], and quadratic relaxations [Coffrin et al., 2015]. These works have established conditions under which these relaxations are exact, and understanding cases in which this is not so [Molzahn et al., 2013]. However, when these relaxations are inexact, the resulting dispatch is possibly non-implementable. Therefore, our interest in the present paper is to tackle directly this problem as a non-convex optimization problem with non-linear equality constraints.

Contributions.

The main contributions of the paper are the following:

(i)

We propose a new GN algorithm for solving a general class of optimization problems with non-convex constraints. We utilize an exact non-smooth penalty reformulation of the original problem and suggest a GN scheme to solve this penalized problem, where the subproblem in this scheme is a strongly convex program, which can be efficiently solved by several recent and highly efficient third-party convex optimization solvers.

Since our proposed approach preserves convexity in $f$ and $\Omega$ of (1) in the subproblem, our method can solve a broader class of problems with theoretical guarantee than classical IP or SQP methods. In particular, our method can solve problem instances of (1) with nonsmooth convex objective terms or semidefinite cone constraints, see Subsection 3.4 for a concrete example.

(ii)

We establish the best-known global convergence rate (i.e., convergence from any starting point in a given sublevel set) for our method to a stationary point, which is a candidate for a local optimal solution. Moreover, under some more restrictive conditions, we also derive a local quadratic convergence for our scheme.

(iii)

We apply our proposed algorithm to the quadratic formulation of AC-OPF. We show that the newly developed algorithm can be implemented efficiently on AC-OPF problems and test it on several numerical examples from the well-known MATPOWER library [Zimmerman et al., 2010]. We often observe comparable performance to the well-established and widely-used IPOPT solver. Also, since our approach relies on an iterative scheme, using the previous iterate to accelerate the computation of the next iterate can significantly improve performance: this is what we refer to as warm-start. Even though most solvers do not possess warm-start capabilities for this type of problems, we show how warm-start can significantly improve the performances of the proposed method. Section 3 provides details regarding performance and potential improvements.

We emphasize that, as opposed to the classical GN approach, our proposed algorithm relies on the $\ell_{1}$ -norm penalty, which typically has a better condition number than the quadratic penalty, as discussed in [Nesterov, 2007b]. More generally, if we replace this $\ell_{1}$ -penalty with any other exact and Lipschitz continuous penalty function, then our theoretical results still hold. Unlike certain sophisticated methods such as Interior-Point Methods (IPMs) and Sequential Quadratic Programming (SQP) schemes, our GN method is simple to implement. Its workhorse is the solution of a strongly convex problem. As a result, its efficiency depends on the efficiency of a third-party solver for this problem as well as the benefit of warm-start strategies. Our main motivation is to exploit recent advances in large-scale convex optimization in order to create a flexible algorithm that can reuse this resource.

Content.

The paper is organized as follows. In Section 2, we introduce our Gauss-Newton algorithm and analyze its convergence properties. In Section 3, we present the AC-OPF problem, its quadratic reformulation and test our Gauss-Newton algorithm on several representative MATPOWER test cases.

2 A Gauss-Newton Algorithm for Non-Convex Optimization

In this section, we present the main assumptions for the non-convex optimization problem (1), propose an exact penalty reformulation, and solve it using a Gauss-Newton-type algorithm. We further characterize the global and local convergence rates of our algorithm.

2.1 Exact penalty approach for constrained non-convex programming

For the non-convex optimization problem (1) we assume that the objective function $f$ is convex and differentiable and $\Omega$ is a compact convex set. Note that our method developed in the sequel can also be extended non-smooth convex function $f$ or smooth non-convex function $f$ whose gradient is Lipschitz continuous, but we make this assumption for simplicity of presentation. Furthermore, the non-convexity enters into the optimization problem through the non-linear equality constraints $\Psi(\boldsymbol{x})=0$ defined by $\Psi:\mathbb{R}^{d}\to\mathbb{R}^{n}$ . We assume that $\Psi$ is differentiable and its Jacobian $\Psi^{\prime}$ is Lipschitz continuous, i.e. there exists $L_{\Psi}>0$ such that:

[TABLE]

where $\|\cdot\|$ is the $\ell_{2}$ -norm. Further, let $\mathcal{N}_{\Omega}$ denote the normal cone of the convex set $\Omega$ :

[TABLE]

Since problem (1) is non-convex, our goal is to search for a stationary point of this optimization problem that is a candidate for a local optimum in the following sense.

Definition 2.1 ([Nocedal and Wright, 2006](Theorem 12.9)).

A point $(\boldsymbol{x}^{*},\boldsymbol{y}^{*})$ is said to be a KKT point of (1) if it satisfies the following conditions:

[TABLE]

Here, $\boldsymbol{x}^{*}$ is called a stationary point of (1), and $\boldsymbol{y}^{*}$ is the corresponding multiplier. Let $\mathcal{S}^{*}$ denote the set of these stationary points.

Since $\Omega$ is compact, and $\Psi$ and $f$ are continuous, by the well-known Weierstrass theorem, we have:

Proposition 2.1.

If $\Omega\cap\left\{\boldsymbol{x}\mid\Psi(\boldsymbol{x})=0\right\}\neq\emptyset$ , then (1) has global optimal solutions.

2.2 Exact penalized formulation

Associated with (1), we consider its exact penalty form [Nocedal and Wright, 2006, Chapt. 17.3]:

[TABLE]

where $\beta>0$ is a penalty parameter, and $|\cdot|$ is the $\ell_{1}$ -norm. Two reasons for choosing an exact (non-smooth) penalty are as follows. First, for a certain finite choice of the parameter $\beta$ , a single minimization in $x$ of (3) can yield an exact solution of the original problem (1). Second, it does not square the condition number of $\Psi$ as in the case of quadratic penalty methods, thus making our algorithm presented below more robust to ill-conditioning of the non-convex constraints. Now, we summarize the relationship between stationary points of (1) and of its penalty form (3). For this, let us define the directional derivative:

[TABLE]

where $\xi(\boldsymbol{x}^{*})\in\partial|\Psi(\boldsymbol{x}^{*})|$ is one subgradient of $|\cdot|$ at $\Psi(\boldsymbol{x}^{*})$ , and $\partial{|\cdot|}$ denotes the subdifferential of $|\cdot|$ , see [Nesterov, 2007a]. Recall that the necessary optimality condition of (3) is

[TABLE]

Then, this condition can be expressed equivalently as

[TABLE]

where $\mathcal{F}_{\Omega}(\boldsymbol{x})$ is the set of feasible directions to $\Omega$ at $\boldsymbol{x}$ :

[TABLE]

Any point $\boldsymbol{x}^{*}$ satisfying (5) is called a stationary point of the penalized problem (3). Stationary points are candidates for local minima, local maxima, and saddle-points. If, in addition, $\boldsymbol{x}^{*}$ is feasible to (1), then we say that $\boldsymbol{x}^{*}$ is a feasible stationary point. Otherwise, we say that $\boldsymbol{x}^{*}$ is an infeasible stationary point. Proposition 2.2 shows the relation between (1) and (3).

Proposition 2.2 ([Nocedal and Wright, 2006], (Theorem 17.4.)).

Suppose that $\boldsymbol{x}^{*}$ is a feasible stationary point of (3) for $\beta$ sufficiently large. Then, $\boldsymbol{x}^{*}$ is also stationary point of the original problem (1).

Proposition 2.2 requires $\boldsymbol{x}^{*}$ to be feasible for (1). When the feasible set $\Omega\cap\left\{\boldsymbol{x}\mid\Psi(\boldsymbol{x})=0\right\}\neq\emptyset$ of (1) is nonempty and bounded, according to [Pillo, 1994, Proposition 2], if (1) satisfies the extended Mangarasian-Fromovitz constrained qualification condition (see [Pillo, 1994, Proposition 2] for concrete definition), then there exists $\beta_{*}>0$ such that for any $\beta>\beta_{*}$ , every global or local solution of the penalized problem (3) is also a global or local optimal solution of (1), respectively. By [Pillo, 1994, Proposition 3], $\beta$ needs to be chosen such that $\beta>\beta_{*}:=\|\boldsymbol{y}^{*}\|_{\infty}$ , where $\boldsymbol{y}^{*}$ is any optimal Lagrange multiplier of (1). We will discuss in detail the choice of $\beta$ in the sections below.

2.3 Global Gauss-Newton method

We first develop our GN algorithm. Then, we investigate its global convergence rate.

2.3.1 The derivation of the Gauss-Newton scheme and the full algorithm

Our GN method aims at solving the penalized problem (3) using the following convex subproblem:

[TABLE]

where $\boldsymbol{x}^{k}$ is a given point in $\Omega$ for linearization, $\Psi^{\prime}(\cdot)$ is the Jacobian of $\Psi$ , and $L>0$ is a regularization parameter.

Note that our subproblem (7) differs from those used in classical penalty methods [Nocedal and Wright, 2006], since we linearize the constraints and we also add a regularization term. Thus, the objective function of (7) is strongly convex. Hence, if $\Omega$ is nonempty and even if the problem is non-differentiable, this problem admits a unique optimal solution, and can be solved efficiently by several convex methods and solvers. For instance, alternating direction methods of multipliers (ADMM) [Boyd et al., 2011] and primal-dual schemes [Chambolle and Pock, 2011] can be efficient for solving (7). Note that the convergence guarantees of ADMM and primal-dual schemes often depends on the distance between the initial point $\boldsymbol{x}^{k,0}$ of the algorithm and the exact optimal solution of $\bar{\boldsymbol{x}}^{k+1}$ of (7), see, e.g, [Chambolle and Pock, 2011, Theorem 2]. Hence, if we warm-start $\boldsymbol{x}^{k,0}$ at the previous approximate solution $\boldsymbol{x}^{k}$ obtained at the $(k-1)$ -th iteration, then the distance $\|\boldsymbol{x}^{0}-\bar{\boldsymbol{x}}^{k+1}\|$ is small. This allows the algorithm to converge faster to a desired approximate solution $\boldsymbol{x}^{k+1}$ of (7) at the $k$ -th iteration.

Let us define

[TABLE]

The necessary and sufficient optimality condition for subproblem (7) becomes

[TABLE]

where $\xi(\boldsymbol{x}^{k})\in\partial|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{L}(\boldsymbol{x}^{k})-\boldsymbol{x}^{k})|$ . Given $\boldsymbol{V}_{L}(\boldsymbol{x}^{k})$ , we define the following quantities:

[TABLE]

Then, $\boldsymbol{G}_{L}(\cdot)$ can be considered as a gradient mapping of $F$ in (3) [Nesterov, 2007a], and $\boldsymbol{d}_{L}(\boldsymbol{x}^{k})$ is a search direction for Algorithm 1. As we will see later in Lemma 2.2, $L$ should be chosen such that $0<L\leq\beta L_{\Psi}$ . Now, using the subproblem (7) as a main component, we describe our GN scheme in Algorithm 1.

The main step of Algorithm 1 is the solution of the convex subproblem (7) at Step 4. As mentioned, this problem is strongly convex, and can be solved by several methods that converge linearly. If we choose $L_{k}\equiv L\geq\beta L_{\Psi}$ , then we do not need to perform a line-search on $L$ at Step 4, and only need to solve (7) once per iteration. However, $L_{\Psi}$ may not be known or if it is known, the global upper bound $\beta L_{\Psi}$ may be too conservative, i.e. it does not take into account the local structures of non-linear functions in (3). Therefore, following the algorithm in [Nesterov, 2007a], we propose performing a line-search in order to find an appropriate $L_{k}$ . If we perform a line-search by doubling $L_{k}$ at each step starting from $L_{\min}$ , (i.e., $L_{k}\rightarrow 2L_{k}$ ), then after $i_{k}$ line-search steps, we have $L_{k}=2^{i_{k}}L_{\min}$ , and the number of line-search iterations $i_{k}$ is at most $\lfloor\log_{2}(\beta L_{\Psi}/L_{\min})\rfloor+1$ . Note that it is rather straightforward to estimate $L_{\min}$ . For example, we can set $L_{\min}:=\frac{c\beta\|\Psi^{\prime}(\hat{\boldsymbol{x}}^{0})-\Psi^{\prime}(\boldsymbol{x}^{0})\|}{\|\hat{\boldsymbol{x}}^{0}-\boldsymbol{x}^{0}\|}\leq\beta L_{\Psi}$ for some $\hat{\boldsymbol{x}}^{0}\neq\boldsymbol{x}^{0}$ and $c\in(0,1]$ . The penalty parameter $\beta$ can be fixed or can be updated gradually using a run-and-inspect strategy, see next section.

2.3.2 Global convergence analysis

We first summarize some properties of Algorithm 1 when the penalty parameter $\beta$ is fixed at a given positive value for all iterations.

Lemma 2.1.

Let $\boldsymbol{V}_{L}$ be defined by (8), and $\boldsymbol{G}_{L}$ , $\boldsymbol{d}_{L}$ , and $r_{L}$ be defined by (10). Then the following statements hold:

$\mathrm{(a)}$

If $\boldsymbol{V}_{L_{k}}(\boldsymbol{x}^{k})=\boldsymbol{x}^{k}$ , then $\boldsymbol{x}^{k}$ is a stationary point of (3).

$\mathrm{(b)}$

The norm $\|\boldsymbol{G}_{L_{k}}(\boldsymbol{x}^{k})\|$ is nondecreasing in ${L_{k}}$ , and $r_{L_{k}}(\boldsymbol{x}^{k})$ is nonincreasing in ${L_{k}}$ . Moreover, we have

[TABLE]

$\mathrm{(c)}$

If $\Psi^{\prime}(\cdot)$ is Lipschitz continuous with the Lipschitz constant $L_{\Psi}$ , then, for any $x\in\Omega$ , we have

[TABLE]

Proof.

(a) Substituting $\boldsymbol{V}_{L_{k}}(\boldsymbol{x}^{k})=\boldsymbol{x}^{k}$ into (9), we again obtain the optimality condition (5). This shows that $\boldsymbol{x}^{k}$ is a stationary point of (3).

(b) Since the function $q(t,\boldsymbol{x}):=f(\boldsymbol{x})+\beta|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{x}-\boldsymbol{x}^{k})|+\frac{1}{2t}\|\boldsymbol{x}-\boldsymbol{x}^{k}\|^{2}$ is convex in two variables $\boldsymbol{x}$ and $t$ , we have that $\eta(t):=\min_{\boldsymbol{x}\in\Omega}q(t,\boldsymbol{x})$ is still convex. It is easy to show that $\eta^{\prime}(t)=-\frac{1}{2t^{2}}\|\boldsymbol{V}_{1/t}(\boldsymbol{x}^{k})-\boldsymbol{x}^{k}\|^{2}=-\frac{1}{2t^{2}}\|\boldsymbol{d}_{1/t}(\boldsymbol{x}^{k})\|^{2}=\frac{1}{2}\|\boldsymbol{G}_{1/t}(\boldsymbol{x})\|^{2}$ . Since $\eta(t)$ is convex, $\eta^{\prime}(t)$ is nondecreasing in $t$ . This implies that $\|\boldsymbol{G}_{1/t}(\boldsymbol{x}^{k})\|$ is nonincreasing in $t$ . Thus $\|\boldsymbol{G}_{L}(\boldsymbol{x}^{k})\|$ is nondecreasing in $L$ and $r_{L}(\boldsymbol{x}^{k}):=\|\boldsymbol{d}_{L}(\boldsymbol{x}^{k})\|$ is nonincreasing in $L$ . To prove (11), note that the convexity of $\eta$ implies that

[TABLE]

On the other hand, $\mathcal{Q}_{L}(\boldsymbol{V}_{L}(\boldsymbol{x}^{k});\boldsymbol{x}^{k})=\eta(1/L)$ . Substituting this relation into (13), we obtain (11).

(c) Let use define $\boldsymbol{V}_{k}:=\boldsymbol{V}_{L_{k}}(\boldsymbol{x}^{k})$ . From the optimality condition (9), for any $\boldsymbol{x}\in\Omega$ , we have

[TABLE]

where $\xi(\boldsymbol{x}^{k})\in\partial|\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{k}-\boldsymbol{x}^{k})|$ . Substituting $\boldsymbol{x}=\boldsymbol{x}^{k}$ into this condition, we have

[TABLE]

Since $f$ is convex, we have:

[TABLE]

By exploiting the convexity of $|\cdot|$ at point $\Psi(\boldsymbol{x}^{k})+\Psi^{\prime}(\boldsymbol{x}^{k})(\boldsymbol{V}_{k}-\boldsymbol{x}^{k})$ , we have:

[TABLE]

Since $\Psi^{\prime}$ is Lipschitz continuous, we also have

[TABLE]

Combining these three bounds, we can show that

[TABLE]

which implies

[TABLE]

Since $r^{2}_{L_{k}}(\boldsymbol{x}^{k})=\|\boldsymbol{V}_{k}-\boldsymbol{x}^{k}\|^{2}=\frac{1}{L_{k}^{2}}\|\boldsymbol{G}_{L_{k}}(\boldsymbol{x}^{k})\|^{2}$ , we obtain the first inequality of (12) from the last inequality. Moreover, from (4) we have

[TABLE]

Using (14), we can show that $DF(\boldsymbol{x}^{k})[\boldsymbol{d}_{L_{k}}(\boldsymbol{x}^{k})]\leq-{L_{k}}\|\boldsymbol{V}_{k}-\boldsymbol{x}^{k}\|^{2}$ , which is the second inequality of (12). ∎

The proof of Statement $\mathrm{(a)}$ shows that if we can find $\boldsymbol{x}^{k}$ such that $\|\boldsymbol{G}_{L_{k}}(\boldsymbol{x}^{k})\|\leq\varepsilon$ , then $\boldsymbol{x}^{k}$ is an approximate stationary point of (3) within the accuracy $\varepsilon$ . From statement $\mathrm{(b)}$ , we can see that if the line-search condition $F(\boldsymbol{V}_{L_{k}}(\boldsymbol{x}^{k}))\leq\mathcal{Q}_{L_{k}}(\boldsymbol{V}_{L_{k}}(\boldsymbol{x}^{k});\boldsymbol{x}^{k})$ at Step 4 holds, then $F(\boldsymbol{V}_{L_{k}}(\boldsymbol{x}^{k}))\leq F(\boldsymbol{x}^{k})-\frac{{L_{k}}}{2}r^{2}_{L_{k}}(\boldsymbol{x}^{k})$ . That is, the objective value $F(\boldsymbol{x}^{k})$ decreases at least by $\frac{{L_{k}}}{2}r^{2}_{L_{k}}(\boldsymbol{x}^{k})$ after the $k$ -th iteration. We first claim that Algorithm 1 is well-defined.

Lemma 2.2.

Algorithm 1 is well-defined, i.e. step 4 terminates after a finite number of iterations. That is, if $L\geq\beta L_{\Psi}$ , then $F(\boldsymbol{V}_{L}(\boldsymbol{x}^{k}))\leq\mathcal{Q}_{L}(\boldsymbol{V}_{L}(\boldsymbol{x}^{k});\boldsymbol{x}^{k})$ .

Proof.

Since $\Psi^{\prime}$ is $L_{\Psi}$ -Lipschitz continuous, for any $\boldsymbol{x}^{k}$ and $\boldsymbol{V}_{L}(\boldsymbol{x}^{k})$ , we have

[TABLE]

Using the definition of $\mathcal{Q}_{L}(\boldsymbol{V};\boldsymbol{x})$ , we obtain

[TABLE]

From this inequality, we can see that if $L\geq\beta L_{\Psi}$ , then $F(\boldsymbol{V}_{L}(\boldsymbol{x}^{k}))\leq\mathcal{Q}_{L}(\boldsymbol{V}_{L}(\boldsymbol{x}^{k});\boldsymbol{x}^{k})$ . Hence, Step 4 of Algorithm 1 terminates after a finite number of iterations. ∎

Let $\mathcal{L}_{F}(\alpha)=\{x\in\Omega~{}\mid~{}F(x)\leq\alpha\}$ be the level set of $F$ at $\alpha$ . Now, we are ready to state the following theorem on global convergence of Algorithm 1.

Theorem 2.1.

Let $\left\{\boldsymbol{x}^{k}\right\}$ be the sequence generated by Algorithm 1. Then $\left\{\boldsymbol{x}^{k}\right\}\subset\mathcal{L}_{F}(F(\boldsymbol{x}^{0}))$ and

[TABLE]

where $F^{\star}:=\inf_{\boldsymbol{x}\in\Omega}F(\boldsymbol{x})>-\infty$ . Moreover, we also obtain

[TABLE]

and the set of limit points $\hat{\mathcal{S}}^{*}$ of the sequence $\{\boldsymbol{x}^{k}\}_{k\geq 0}$ is connected. If this sequence is bounded (in particular, if $\mathcal{L}_{F}(F(\boldsymbol{x}^{0}))$ is bounded) then every limit point is a stationary point of (3). Moreover, if the set of limit points $\hat{\mathcal{S}}^{*}$ is finite, then the sequence $\left\{\boldsymbol{x}^{k}\right\}$ converges to a stationary point $\boldsymbol{x}^{*}\in\mathcal{S}^{*}$ of (3). If, in addition, $\boldsymbol{x}^{\ast}$ is feasible to (1) and $\beta$ is sufficiently large, then $\boldsymbol{x}^{*}$ is also a stationary point of (1).

Proof.

From Step 5 of Algorithm 1, we have $\boldsymbol{x}^{k+1}:=\boldsymbol{V}_{L_{k}}(\boldsymbol{x}^{k})$ and $\left\{\boldsymbol{x}^{k}\right\}\subset\Omega$ . Using (12), it is easy to obtain $-\infty<F^{\star}\leq F(\boldsymbol{x}^{k+1})\leq F(\boldsymbol{x}^{k})\leq\cdots\leq F(\boldsymbol{x}^{0})$ . This shows that $\left\{\boldsymbol{x}^{k}\right\}\subset\mathcal{L}_{F}(F(\boldsymbol{x}^{0}))$ , and $\left\{F(\boldsymbol{x}^{k})\right\}$ is a decreasing sequence and bounded. Hence, it has at least a convergent subsequence. Moreover, from (12), we also have

[TABLE]

Summing up the inequality (17) from $k=0$ to $k=K$ and using $F(\boldsymbol{x}^{k+1})\geq F^{\star}$ , we obtain

[TABLE]

This implies

[TABLE]

which leads to (15). Similarly, for any $N\geq 0$ one has

[TABLE]

Note that the sequence $\left\{F(\boldsymbol{x}^{k})\right\}_{k\geq 0}$ has a convergent subsequence, thus passing to the limit as $k\to\infty$ in (18) we obtain the first limit of (16). Since $\|\boldsymbol{x}^{k+1}-\boldsymbol{x}^{k}\|=r_{L_{k}}(\boldsymbol{x}^{k})\geq r_{\beta L_{\Psi}}(\boldsymbol{x}^{k})=\frac{1}{\beta L_{\Psi}}\|\boldsymbol{G}_{\beta L_{\Psi}}(\boldsymbol{x}^{k})\|$ due to Statement (b) of Lemma 2.1, the first limit of (16) also implies the second one. If the sequence $\left\{\boldsymbol{x}^{k}\right\}_{k\geq 0}$ is bounded, by passing to the limit through a subsequence and combining with Lemma 2.1, we easily prove that every limit point is a stationary point of (3). If the set of limit points $\hat{\mathcal{S}}^{*}$ is finite, by applying the result in [Ostrowski, 1966][Chapt. 28], we obtain the proof of the remaining conclusion. ∎

Theorem 2.1 provides a global convergence result for Algorithm 1. Moreover, our algorithm requires solving convex subproblems at each iteration, thus offering a great advantage over classical penalty-type schemes. Since the underlying problem is non-convex, the iterates of our algorithm may get trapped at points that may be infeasible for the original problem. That is, under the stated conditions, the iterate sequence $\left\{\boldsymbol{x}^{k}\right\}$ may converge to a local minimum (stationary) point $\boldsymbol{x}^{*}$ of (3). Since $\boldsymbol{x}^{*}\in\Omega$ , if $\Psi(\boldsymbol{x}^{*})=0$ , then $\boldsymbol{x}^{*}$ is also a local minimum (stationary) point of the original problem (1).

We can sometimes overcome this by combining the algorithm with a run-and-inspect procedure [Chen et al., 2019], whereby if $\boldsymbol{x}^{*}$ violates $\Psi(\boldsymbol{x})=0$ , then we restart the algorithm at a new starting point. More precisely, we add an inspect phase to our existing algorithm that helps escape from non-feasible stationary points. In the inspection phase, if $\Psi(\boldsymbol{x}^{*})\not=0$ we sample a point around the current point and increase the parameter $\beta$ . Since we do not know any optimal Lagrange multiplier $\boldsymbol{y}^{*}$ of (1), we cannot guarantee that $\beta>\|\boldsymbol{y}^{*}\|_{\infty}$ . However, since it is expected that the multiplier $\boldsymbol{y}_{k}$ of the subproblem (7) converges to $\boldsymbol{y}^{*}$ , we can use $\boldsymbol{y}_{k}$ to monitor the update of $\beta$ by guaranteeing that $\beta>|\boldsymbol{y}_{k}|_{\infty}$ . We have seen that such strategy performs well on a set of realistic non-convex AC-OPF problems. Nevertheless, in this paper, we do not have theoretical guarantee for this variant and leave this extension for future work.

2.4 Local convergence analysis

Let us study a special case of (3) where Algorithm 1 has a local quadratic convergence rate. Our result relies on the following assumptions. First, for simplicity of our analysis, we assume that $\beta>0$ is fixed and $L_{k}$ is also fixed at $L_{k}:=L>0$ for $k\geq 0$ in Algorithm 1. Next, let $\boldsymbol{x}^{\ast}$ be a stationary point of (3) such that

[TABLE]

where ${\omega_{\min}}>0$ is a given constant independent of $x$ and $\mathcal{N}(\boldsymbol{x}^{\ast})$ is a neighborhood of $\boldsymbol{x}^{\ast}$ . The condition (19) is rather technical, but it holds in the following case. Let us assume that the Jacobian $\Psi^{\prime}(\boldsymbol{x}^{\ast})$ of $\Psi$ at $\boldsymbol{x}^{\ast}$ satisfies the following condition

[TABLE]

where $\sigma_{\min}(\Psi^{\prime}(\boldsymbol{x}^{\ast})$ is the positive smallest singular value of $\Psi^{\prime}(\boldsymbol{x}^{*})$ . This condition is similar to the strong second-order sufficient optimality condition [Nocedal and Wright, 2006], but only limited to the linear objective function. In this case, we have $|\Psi^{\prime}(\boldsymbol{x}^{\ast})(\boldsymbol{x}-\boldsymbol{x}^{\ast})|\geq\|\Psi^{\prime}(\boldsymbol{x}^{\ast})(\boldsymbol{x}-\boldsymbol{x}^{\ast})\|\geq\sigma_{\min}(\Psi^{\prime}(\boldsymbol{x}^{\ast}))\|\boldsymbol{x}-\boldsymbol{x}^{\ast}\|$ . Therefore, it leads to

[TABLE]

For $\beta>0$ sufficiently large such that $\beta>\frac{\|\nabla{f}(\boldsymbol{x}^{\ast})\|}{\sigma_{\min}(\Psi^{\prime}(\boldsymbol{x}^{\ast}))}$ , we have ${\omega_{\min}}:=\beta\sigma_{\min}(\Psi^{\prime}(\boldsymbol{x}^{\ast}))-\|\nabla{f}(\boldsymbol{x}^{\ast})\|>0$ , and the condition (19) holds. Now, we prove a local quadratic convergence of Algorithm 1 under assumption (19).

Note that a fast local convergence rate such as superlinear or quadratic is usually expected in Gauss-Newton methods, see, e.g., [Kelley, 1999, Theorem 2.4.1.]. The following theorem shows that Algorithm 1 can also achieve a fast local quadratic convergence rate under a more restrictive condition (19).

Theorem 2.2.

Let $\left\{\boldsymbol{x}_{k}\right\}$ be the sequence generated by Algorithm 1 such that it converges to a feasible stationary point $\boldsymbol{x}^{\ast}$ of (3). Assume further that $\boldsymbol{x}^{\ast}$ satisfies condition (19) for some ${\omega_{\min}}>0$ . Then, if $\boldsymbol{x}_{k}\in\mathcal{L}_{F}(F(\boldsymbol{x}_{0}))$ such that $\|\boldsymbol{x}_{k}-\boldsymbol{x}^{\ast}\|\leq\frac{2{\omega_{\min}}}{L+(\sqrt{n}+5)\beta L_{\Psi}}$ , then $\boldsymbol{x}_{k+1}\in\mathcal{L}_{F}(F(\boldsymbol{x}_{0}))$ and

[TABLE]

As a consequence, the sequence $\left\{\boldsymbol{x}_{k}\right\}$ locally converges to $\boldsymbol{x}^{\ast}$ at a quadratic rate.

Proof.

Let $d(\boldsymbol{x},\boldsymbol{y}):=\Psi(\boldsymbol{y})-\Psi(\boldsymbol{x})-\Psi^{\prime}(\boldsymbol{x})(\boldsymbol{y}-\boldsymbol{x})$ . First, by the Lipschitz continuity of $\Psi^{\prime}(\cdot)$ , we have $|d(\boldsymbol{x},\boldsymbol{y})|\leq\sqrt{n}\|d(\boldsymbol{x},\boldsymbol{y})\|\leq\frac{\sqrt{n}L_{\Psi}}{2}\|\boldsymbol{y}-\boldsymbol{x}\|^{2}$ . In this case, from this estimate and (8), for any $\boldsymbol{x},\boldsymbol{y}\in\mathcal{L}_{F}(F(\boldsymbol{x}_{0}))$ , we can derive that

[TABLE]

This estimate together with $\Psi(\boldsymbol{x}^{\ast})=0$ and $\boldsymbol{x}=\boldsymbol{x}_{k}$ imply that

[TABLE]

Moreover, by $\mu_{f}$ -convexity of $f$ , we have $f(\boldsymbol{x}_{k+1})-f^{\star}\geq\langle{\nabla{f}(\boldsymbol{x}^{\ast}),\boldsymbol{x}_{k+1}-\boldsymbol{x}^{\ast}}\rangle+\frac{\mu_{f}}{2}\|\boldsymbol{x}_{k+1}-\boldsymbol{x}^{\ast}\|^{2}$ . Using these last two estimates and the definition of $\mathcal{Q}_{L}$ , we can show that

[TABLE]

Finally, using condition (19), we obtain from the last estimate that

[TABLE]

Rearranging this inequality given that $\|\boldsymbol{x}_{k}-\boldsymbol{x}^{\ast}\|<\frac{{\omega_{\min}}}{\beta L_{\Psi}}$ , we obtain (20). From (20), we can see that if $\|\boldsymbol{x}_{k}-\boldsymbol{x}^{\ast}\|\leq\frac{2{\omega_{\min}}}{L+(\sqrt{n}+5)\beta L_{\Psi}}$ , then $\|\boldsymbol{x}_{k}-\boldsymbol{x}^{\ast}\|<\frac{{\omega_{\min}}}{\beta L_{\Psi}}$ . Moreover, $\|\boldsymbol{x}_{k+1}-\boldsymbol{x}^{\ast}\|\leq\|\boldsymbol{x}_{k}-\boldsymbol{x}^{\ast}\|$ . Hence, if $\boldsymbol{x}_{k}\in\mathcal{L}_{F}(F(\boldsymbol{x}_{0}))$ , then $\boldsymbol{x}_{k+1}\in\mathcal{L}_{F}(F(\boldsymbol{x}_{0}))$ . The last statement is a direct consequence of (20). ∎

3 A Case Study: The Optimal Power Flow Problem

In this section, we present the optimal power flow problem and its reformulation in a form that obeys the structure presented in the previous section. We then perform numerical experiments to validate the GN algorithm and we compare performance with IPOPT.

3.1 Problem settings

Original AC-OPF.

Consider a directed electric power network with a set of nodes ${\cal B}$ and a set of branches $\mathcal{L}$ . The network consists of a set $\cal G$ of generators, with $\mathcal{G}_{i}$ denoting the set of generators at bus $i$ . Denote $Y=G+jB$ as the system admittance matrix, $G$ being the conductance and $B$ the susceptance ( $j^{2}=-1$ ) [Taylor, 2015]. The decision variables of AC-OPF are the voltage magnitudes $\boldsymbol{v}\in\mathbb{R}^{|{\cal B}|}$ , phase angles $\boldsymbol{\theta}\in\mathbb{R}^{|{\cal B}|}$ , and the real and reactive power outputs of generators, which are denoted as $\boldsymbol{p}\in\mathbb{R}^{|{\cal G}|}$ and $\boldsymbol{q}\in\mathbb{R}^{|{\cal G}|}$ . We will consider a fixed real and reactive power demand at every node $i$ , which we denote as $P_{i}^{d}$ and $Q_{i}^{d}$ , respectively. The constraints of the AC-OPF problem can be described by equations (21)-(25), see [Kocuk et al., 2016].

[TABLE]

Constraints (21) and (22) correspond to the real and reactive power balance equations of node $i$ . Constraints (23) and (24) impose complex power flow limits on each line, which we indicate by a parameter matrix $\boldsymbol{S}$ . Constraints (25) impose bounds on voltage magnitudes (indicated by parameter vectors $\boldsymbol{\underline{V}}$ and $\boldsymbol{\overline{V}}$ ), bounds on real power magnitudes (indicated by parameter vectors $\boldsymbol{\underline{P}}$ and $\boldsymbol{\overline{P}}$ ), bounds on reactive power magnitudes (indicated by parameter vectors $\boldsymbol{\underline{Q}}$ and $\boldsymbol{\overline{Q}}$ ), and bounds on voltage phase angles differences (indicated by ${\underline{\theta}_{ij}}$ and ${\overline{\theta}_{ij}}$ for each line $(i,j)\in\mathcal{L}$ ). Note that the power balance equality constraints (21), (22) as well as the inequality constraints (23), (24) are non-convex with respect to $\boldsymbol{v}$ and $\boldsymbol{\theta}$ .

We will consider the objective of minimizing real power generation costs. Hence, we consider a convex quadratic objective function $f$ :

[TABLE]

where $\boldsymbol{C_{2}}\geq 0$ and $\boldsymbol{C_{1}}$ are given coefficients of the cost function. The AC OPF problem then reads as the following non-convex optimization problem:

[TABLE]

Quadratic reformulation.

The starting point of our proposed GN method for solving problem ${\cal P}_{opt}$ is the quadratic reformulation of AC-OPF [Expósito and Ramos, 1999]. In this reformulation, we use a new set of variables $c_{ij}$ and $s_{ij}$ for replacing the voltage magnitudes and angles $v_{i}\angle\theta_{i}=v_{i}e^{j\theta_{i}}=v_{i}(\cos(\theta_{i})+j\sin(\theta_{i}))$ . These new variables are defined for all $i\in{\cal B}$ and $(i,j)\in\mathcal{L}$ as:

[TABLE]

where we will denote the vectors $\boldsymbol{c}$ and $\boldsymbol{s}$ as the collection of the $c_{ij}$ and $s_{ij}$ variables, respectively.

Assuming $\underline{{\theta}}_{ij}>-\frac{\pi}{2}$ and $\overline{{\theta}}_{ij}<\frac{\pi}{2},\ \forall(i,j)\in\mathcal{L}$ (which is common practice), the mapping from $(\boldsymbol{v},\boldsymbol{\theta})$ to $(\boldsymbol{c},\boldsymbol{s})$ defined by (26), (27) can be inverted as follows:

[TABLE]

thereby defining a bijection in $(\boldsymbol{c},\boldsymbol{s})$ and $(\boldsymbol{v},\boldsymbol{\theta})$ .

The set of $(\boldsymbol{c},\boldsymbol{s})$ and $(\boldsymbol{v},\boldsymbol{\theta})$ that define this bijection is further equivalent to the following set of non-linear non-convex constraints in $(\boldsymbol{c},\boldsymbol{s},\boldsymbol{v},\boldsymbol{\theta})$ , see [Kocuk et al., 2016]:

[TABLE]

Now, we will substitute the voltage magnitude variables into the problem ${\cal P}_{opt}$ , and consider the problem on the variables $(\boldsymbol{c},\boldsymbol{s},\boldsymbol{\theta})$ . This reformulation has been commonly employed in the literature in order to arrive at an SOCP (Second-Order Cone Programming) relaxation of the problem [Expósito and Ramos, 1999, Jabr, 2008]. Through numerical experiments, we demonstrate that this reformulation results in highly effective starting points for our algorithm based on the SOCP relaxation of the AC-OPF. Moreover, the reformulation preserves the power balance constraints in linear form. Thus, the power balance constraints are not penalized in our scheme, which implies that they are respected at every iteration of the algorithm. For all these reasons, we pursue the quadratic reformulation of the present section, despite the fact that it requires the introduction of the new variables $\boldsymbol{c}$ and $\boldsymbol{s}$ .

Concretely, the AC power balance constraints (21) and (22) are linear in $c_{ij}$ and $s_{ij}$ :

[TABLE]

Similarly, the power flow limit constraints (23) and (24) are convex quadratic in $c_{ij}$ and $s_{ij}$ :

[TABLE]

The box constraints (25) are reformulated as follows:

[TABLE]

As a result, an equivalent formulation for the AC-OPF model ${\cal P}_{opt}$ is:

[TABLE]

With this reformulation, the decision variables are $\boldsymbol{x}=(\boldsymbol{p},\boldsymbol{q},\boldsymbol{c},\boldsymbol{s},\boldsymbol{\theta})$ . Constraints (30)-(34) define a convex set. We also have two non-convex equality constraints:

•

Constraints (28): $\Psi^{ij}_{q}(\boldsymbol{c},\boldsymbol{s}):=c_{ij}^{2}+s_{ij}^{2}-c_{ii}c_{jj}=0,\ \forall(i,j)\in\mathcal{L}$ . We will refer to them as quadratic constraints.

•

Constraints (29): $\Psi^{ij}_{t}(\boldsymbol{c},\boldsymbol{s},\boldsymbol{\theta}):=\sin(\theta_{i}-\theta_{j})c_{ij}+\cos(\theta_{i}-\theta_{j})s_{ij}=0,\ \forall(i,j)\in\mathcal{L}$ . We will refer to them as trigonometric constraints.

Now, AC-OPF is written in the format of (1), where $\boldsymbol{x}:=(\boldsymbol{p},\boldsymbol{q},\boldsymbol{c},\boldsymbol{s},\boldsymbol{\theta})$ , $f(\boldsymbol{x)}:=f(\boldsymbol{p})$ , $\Psi(\boldsymbol{x}):=(\Psi^{ij}_{q}(\boldsymbol{c},\boldsymbol{s}),\Psi^{ij}_{t}(\boldsymbol{c},\boldsymbol{s},\boldsymbol{\theta}))$ and:

[TABLE]

Moreover, since $\Psi$ is the collection of (28) and (29), we show in the next lemma that it is differentiable and that its Jacobian is Lipschitz continuous.

Lemma 3.1.

For the AC-OPF problem, $\Psi$ defined by (28)–(29), is smooth, and its Jacobian $\Psi^{\prime}$ is Lipschitz continuous with a Lipschitz constant $L_{\Psi}$ , i.e. $\|\Psi^{\prime}(\boldsymbol{x})-\Psi^{\prime}(\hat{\boldsymbol{x}})\|\leq L_{\Psi}\|\boldsymbol{x}-\hat{\boldsymbol{x}}\|$ for all $\boldsymbol{x},\hat{\boldsymbol{x}}\in\Omega$ , where

[TABLE]

The proof of Lemma 3.1 is shown in Appendix A.

We can now derive AC-OPF subproblems for GN. As a reminder, the subproblems are in the format of (8). Since (30), (31) and (34) are linear constraints and (32) and (33) are second-order cone constraints, $\Omega$ , in the AC-OPF case, is a set of second-order cone constraints. By applying the classical substitution of the norm-1 terms in the objective, the objective of (8) is (convex) quadratic. We typically refer to an SOCP as an optimization problem with second-order cone constraints and a linear objective. As a consequence, the AC-OPF subproblem is a Quadratically Constrained Program (QCP). We are now in a position to apply GN to AC-OPF.

3.2 A practical implementation of the GN algorithm for AC-OPF

In this section, the goal is to optimize the settings of the GN method. We will demonstrate that the choice of $\beta$ and $L$ is crucial. This will allow us to derive a practical version of the GN algorithm, which we compare to IPOPT.

Stopping criteria.

We terminate Algorithm 1 in two occasions, which have been validated through experimental results:

•

If the maximum number of iterations $k_{\max}:=100$ has been reached.

•

If the quadratic and trigonometric constraints are satisfied with a tolerance of $\epsilon_{2}$ , where $\epsilon_{2}:=1e^{-5}$ . Concretely, we stop Algorithm 1 if

[TABLE]

If the difference $\|x^{k+1}-x^{k}\|_{\infty}<\epsilon_{1}$ ( $\epsilon_{1}:=1e^{-6}$ in the numerical experiment), then Algorithm 1 has reached an approximate stationary point of the exact penalized formulation (3) (see Lemma 2.1(a)). In this case, the last iterate might not be feasible for $\mathcal{P}_{opt}^{cs\theta}$ . We then use the run-and-inspect strategy [Chen et al., 2019]: the last iterate becomes the starting point of GN and $\beta$ is doubled.

SOCP relaxation for initialization.

As mentioned previously, the quadratic formulation is also used to derive the SOCP relaxation. In the SOCP relaxation, the angles $\boldsymbol{\theta}$ are not modeled. Consequently, the trigonometric constraints (29) are removed. Moreover, the non-convex constraints (28) are relaxed:

[TABLE]

Then, $\mathcal{P}_{socp}$ is defined such that:

[TABLE]

Solving this relaxation will provide a partial initial point $(\boldsymbol{p}^{0},\boldsymbol{q}^{0},\boldsymbol{c}^{0},\boldsymbol{s}^{0})$ . In order to improve this initial point, we also compute angles $\boldsymbol{\theta}^{0}$ by solving the following optimization problem, where the goal is to minimize the error on the trigonometric constraints:

[TABLE]

This initialization is used for the experimental results on the MATPOWER benchmark.

Parameter tuning strategies.

The convergence theory presented above does not require tuning the $\beta$ and $L$ parameters. In practice, tuning is crucial for improving the performance of algorithms for constrained non-convex optimization, including Algorithm 1.

Several observations allow us to decrease the number of iterations that are required for convergence: (i) according to Proposition 2.2, large values of $\beta$ ensure the equivalence between (1) and (3); (ii) quadratic constraints and trigonometric constraints scale up differently; (iii) a careful updating of $L$ influences the number of times that subproblem (8) is solved. These observations guide a detailed experimental investigation concerning the choices of $\beta$ and $L$ parameters, which is discussed in Appendix B.

Accelerating the computation of subproblem solutions, and warmstart.

The most challenging constraints in the subproblems of GN are the line limit constraints (32) and (33). All the constraints are linear except for (32) and (33) which are quadratic convex. From experimental observations, it is very uncommon that the entire network is congested and that all line limits are binding. One way to benefit from this observation is the following: assuming $x^{k}$ is the $k^{\text{th}}$ iterate of the algorithm and $x^{k}$ is binding for a set $\mathcal{L}^{k}\subseteq\mathcal{L}$ , a next guess $\hat{x}^{k+1}$ will first be computed by only enforcing (32) and (33) for $(i,j)\in\mathcal{L}^{k}$ . This approach of adding lazy network constraints to the model is applied both in theory [Aravena Solís, 2018] as well as in practice by system operators.

Moreover, we observe that the subproblems (8) are based on the same formulation, and only differ by slight changes of certain parameters along the iterations. This motivates us to warm-start the subproblem (8) with a previous primal-dual iterate. In other words, we initialize the solver for solving the subproblem (8) at the $k$ -th iteration at the final solution $\boldsymbol{x}_{k-1}$ and its corresponding multiplier $\boldsymbol{y}_{k-1}$ obtained from the previous iteration $k-1$ . Warm-starting is indeed a key step in iterative methods, including our GN scheme, and will be further analyzed in Section 3.3.2.

3.3 Numerical experiments

In order to validate the proposed GN algorithm, our numerical experiments are conducted in 2 steps: first, we launch simulations on several test cases of a classical library (MATPOWER) and compare the GN algorithm with a state-of-the-art non-convex solver (IPOPT); second, we show the potential benefit of warm-start for our approach.

3.3.1 Illustration on MATPOWER instances

We use the MATPOWER [Zimmerman et al., 2010] library to have access to a wide range of AC-OPF test systems that have been investigated in the literature. We test our approach on instances whose size ranges between 1,354 and 25,000 nodes (1354pegase has 11,192 variables and 27,911 constraints while ACTIVS25k has 186,021 variables and 431,222 constraints).

We benchmark our approach against IPOPT, a non-linear solver based on the interior-point method. To do so, we make use of PowerModels.jl [Coffrin et al., 2018], a Julia package that can be used to solve AC-OPF instances of different libraries with different formulations. In order to make a fair comparison, we initialize GN and IPOPT using the SOCP solution.

Since the subproblem (7) has a non-smooth convex objective due to the $\ell_{1}$ -norm penalty, we reformulate it equivalently to a quadratically constrained program (QCP). We solve this QCP with Gurobi [Gurobi Optimization, 2019]. Note that the time needed to solve the SOCP solution is not reported for two reasons. First, we use the SOCP solution for both GN and IPOPT as a starting point to make a fair comparison, so it will not change the gap between the execution times of the two methods. Second, computing the SOCP solution is negligible compared to the methods tested (always less than 5%).

The results of our analysis are presented in Table 1. In Table 1, from left to right, we report the name of the test case. In the Gauss-Newton part, we report the number of iterations, the number of times run-an-inspect is executed, the objective value, the maximum constraint violation and the execution time (in seconds). In the IPOPT part, we report the objective value, the maximum constraint violation and the execution time (in seconds). The last column provides the gap between the GN solution and the IPOPT solution ( $\text{Gap}=\frac{\text{Obj}_{\text{IPOPT}}-\text{Obj}_{\text{GN}}}{\text{Obj}_{\text{IPOPT}}}$ ).

The first notable observation is that GN finds a stationary point (i.e. feasible) of the original AC-OPF problem for all 23 test cases. The stationary point obtained by GN attains the same objective function value as the one returned by IPOPT for most instances (a difference of 0.05% in the objective value may be attributed to numerical precision) and the proposed method outperforms IPOPT in some instances (e.g. 2737sop).

Our experiments further demonstrate that the GN method consistently requires a small number of iterations (less than 20) in a wide range of instances. This is critically important to further accelerate the performance of our method if we appropriately exploit warm-start strategies and efficient solvers for the strongly convex subproblem.

In terms of computational time, the performance is shared between the two approaches. Nevertheless, some instances reveal limitations of the GN algorithm, compared to IPOPT (6495rte, 9241pegase and ACTIVSg25k for example): when the solution of a subproblem becomes time-consuming because of the size of the subproblem, GN might require a larger execution time. We use Gurobi for solving subproblem (8), because it is one of the most stable QCP solvers that are available.

Unfortunately, Gurobi (and IPMs for QCPs in general) does not support warm-start, which would have significantly decreased the computational time. One alternative is to use an Alternating Direction Method of Multipliers (ADMM) solver that supports warm-start. There is no additional cost in implementing warm-start. Indeed, we simply use the solution of the previous iteration as a starting point for the next one. However, ADMM solvers are not mature enough to test large-scale problems. Implementing an efficient subsolver is out of the scope of this work, however we are able to analyze the effect of warm start on these solvers (and by extension to our proposed algorithm), which is the subject of the next section.

3.3.2 The effect of warm-starting strategy

We consider using OSQP [Stellato et al., 2017] as an ADMM solver. Since OSQP only solves quadratic programs (QPs), and since we are only aiming at illustrating the potential of warm-start, we will drop the line limit constraints (32) and (33), in order to satisfy the requirements of the solver. We consider small test cases (39_epri and 118_ieee) since we observed numerical instability for larger test cases.

The results are presented in Table 2. From left to right, # It reports the number of GN iterations, # ADMM It reports the total number of ADMM iterations performed during the GN execution, and Time reports the sum of OSQP solve times along the iterations in seconds.

For both cases, we observe that warm start divides the total number of ADMM iterations as well as solve time by more than a factor of 2, and almost by a factor of 3 for 118_ieee.

We also examine each GN iteration individually, and highlight the impact of warm-start on the number of ADMM iterations in Figure 1. Note that we warm-start dual and primal variables only after iteration 1. Warm-start decreases substantially the number of ADMM iterations in two cases:

When $L$ is updated. Indeed, updating $L$ only results in slightly changing the objective function. One expects the previous iterates to provide a good warm-start. This is confirmed in Figure 1, where we observe that the number of ADMM iterations is divided by at least a factor of 2 every time $L$ is updated. 2. 2.

When the last iterates are computed. Intuitively, one does not expects iterates to change substantially when approaching the optimal solution. This intuition is confirmed by Figure 1. For the particular case of the last iterate, for 39_epri (resp. 118_ieee), the required number of ADMM iterations is less than 20% (resp. 30%).

This investigation suggests that, with a mature ADMM solver, warm-starting is a promising feature for improving the performance of GN on large test cases.

3.4 Optimization with bilinear matrix inequality constraint

The goal of this subsection is to demonstrate that Algorithm 1 can solve a more general class of problems than classical methods such as IPMs or SQPs. For this purpose, we consider the following optimization problem involving bilinear matrix inequality (BMI) constraints:

[TABLE]

where $t\in\mathbb{R}$ , $F\in\mathbb{R}^{n_{u}\times n_{y}}$ and $P\in\mathbb{R}^{n\times n}$ are optimization variables, $A\in\mathbb{R}^{n\times n}$ , $B\in\mathbb{R}^{n\times n_{u}}$ , and $C\in\mathbb{R}^{n_{y}\times n}$ are given input matrices, and $(\cdot)\preceq 0$ (resp. $\succeq 0$ ) means that $(\cdot)$ is negative semidefinite (resp., positive semidefinite). This problem arises from controller design, and is known as the spectral abscissa problem, see, e.g., [Burke et al., 2002].

If we introduce $S=-(A+BFC)^{\top}P+P(A+BFC)-2tP$ , $\boldsymbol{x}=(S,P,F,t)$ , and $\Omega:=\big{\{}\boldsymbol{x}:=(S,P,F,t)\mid S=S^{\top},\ P=P^{\top},\ S\succeq 0,\ P\succeq 0\big{\}}$ , then we can reformulate (40) into (1) with $f(\boldsymbol{x}):=t$ and $g(\boldsymbol{x}):=(A+BFC)^{\top}P+P(A+BFC)+2tP+S=0$ . Clearly, if we explicitly write $\Omega$ in the following form

[TABLE]

then we obtain a nonsmooth problem, where $\lambda_{\min}(\cdot)$ is the smallest eigenvalue of $(\cdot)$ , which is nonsmooth [Burke et al., 2002]. In this case, IPMs and SQPs are not applicable for solving (40). However, Algorithm 1 can still solve (40) and its theoretical guarantees are preserved.

We test our method by using the following data:

[TABLE]

As starting point, we use:

[TABLE]

and we choose $\beta=1,000,L_{0}=1$ . These choices are motivated by the fact that a large value of $\beta$ ensures convergence, see Proposition 2.2. We apply the same update strategy for $L$ as in the AC-OPF implementation.

Our algorithm converges in 14 iterations to a feasible local optimum. The constraints are satisfied with a tolerance of $1e^{-6}$ .

4 Conclusion

We propose a novel Gauss-Newton algorithm for solving a general class of optimization problems with non-convex constraints. We utilize an exact non-smooth penalty reformulation of the original problem and suggest an iterative scheme for solving this penalized problem which relies on the non-squared Gauss-Newton method. The subproblems of our proposed scheme are strongly convex programs, which can be efficiently solved by numerous third-party convex optimization solvers. We establish a best-known global convergence rate for our method to a stationary point. Under more restrictive conditions, we derive a local quadratic convergence rate for our scheme.

We apply our proposed approach to solve the optimal power flow problem, which is a fundamental and ubiquitous problem in power systems engineering. We apply our proposed algorithm to a reformulation of the AC-OPF, and we propose numerous strategies for tuning our proposed Gauss-Newton scheme, initializing the algorithm, and warm-starting the resolution of the subproblems that are treated by our proposed method. We perform extensive numerical experiments on a large set of instances from the MATPOWER library, and demonstrate the competitive performance of our method to IPOPT, which is a state of the art non-linear non-convex solver. This analysis validates the theoretical analysis of our proposed GN scheme, and proves its effectiveness in practical applications. Future work aims at broadening the scope of such applications beyond power systems.

Acknowledgments

I. Mezghani and A. Papavasiliou acknowledge the financial support of ENGIE-Electrabel. I. Necoara would like to acknowledge the support from the Executive Agency for Higher Education, Research and Innovation Funding (UEFISCDI), Romania, PNIII-P4-PCE-2016-0731, project ScaleFreeNet, no. 39/2017. The work of Quoc Tran-Dinh was partly supported by NSF (DMS-1619884).

Appendix

This appendix provides the full proof of technical results in the main text, the detailed implementation of our algorithm, and additional numerical experiments.

Appendix A The proof of Lemma 3.1

Proof.

From the definition of $\Psi$ , it consists of two parts: quadratic forms in $(c_{ij},s_{ij})$ and trigonometric and linear forms in $\theta_{ij}:=\theta_{i}-\theta_{j}$ and $(c_{ij},s_{ij})$ , respectively. We can write it as $\Psi=[\Psi^{q},\Psi^{t}]$ . Each function in $\Psi^{q}$ has the form $c_{ij}^{2}+s_{ij}^{2}-c_{ii}c_{jj}$ , as shown by (28), and each function in $\Psi^{t}$ has the form $\sin(\theta_{ij})c_{ij}+\cos(\theta_{ij})s_{ij}$ , as shown in (29). We can show that the second derivative of each component of $\Psi^{q}$ w.r.t. $(c_{ii},c_{jj},c_{ij},s_{ij})$ and of $\Psi^{t}$ w.r.t. $(c_{ij},s_{ij},\theta_{ij})$ , respectively is

[TABLE]

The second derivative $\nabla^{2}\Psi^{q}(\boldsymbol{x})$ is constant. Hence, the maximum eigenvalue of $\nabla^{2}\Psi^{q}(\boldsymbol{x})$ is

$\lambda_{\max}(\nabla^{2}\Psi^{q}(\boldsymbol{x}))=2$ . For any $\boldsymbol{u}:=[u_{1},u_{2},u_{3}]\in\mathbb{R}^{3}$ , we can easily estimate that

[TABLE]

Therefore, $\lambda_{\max}\left(\nabla^{2}\Psi^{t}(\boldsymbol{x})\right)=1+|c_{ij}|+|s_{ij}|\leq 1+2\max\left\{V^{2}_{i}\mid i\in\mathcal{B}\right\}$ , where the last inequality follows from (25). Consequently,

[TABLE]

which is (38). ∎

Appendix B Investigation on parameter tuning strategies

In order to explain how the parameters of the GN algorithm should be tuned, we first express the subproblem at iteration $k$ for the special case of AC-OPF.

Subproblem (8) in the context of AC-OPF.

We define the following functions in order to keep our notation compact:

[TABLE]

Note that we are facing two types of constraints (quadratic and trigonometric) and that they may attain different relative scales for different instances, as we have observed in our numerical experiments. We will define different penalty terms depending on the type of constraint, which will increase the flexibility of our implementation, while respecting the theoretical assumptions that we present in the main body of the paper. To this end, we denote by $\beta_{q}$ (resp. $\beta_{t}$ ) the $\beta$ penalty parameter associated with the quadratic (resp. trigonometric) constraints. The trigonometric constraints depend on $\boldsymbol{c}$ , $\boldsymbol{s}$ and $\boldsymbol{\theta}$ , whereas the quadratic ones only depend on $\boldsymbol{c}$ and $\boldsymbol{s}$ . Therefore, we define two separate regularization parameters: $L_{cs}$ and $L_{\theta}$ (we drop the $k$ index from now on for notational simplicity).

The GN subproblem at iteration $k$ , corresponding to (7), is convex and has the form:

[TABLE]

The optimal solution of $\mathcal{P}^{k}_{sub}$ , which we denote by $(\boldsymbol{y}^{\star},\boldsymbol{d}^{\star})$ , provides the next iterate $\boldsymbol{x}^{k+1}=(\boldsymbol{p}^{\star},\boldsymbol{q}^{\star},\boldsymbol{c}^{k}+\boldsymbol{dc}^{\star},\boldsymbol{s}^{k}+\boldsymbol{ds}^{\star},\boldsymbol{\theta}^{k}+\boldsymbol{d\theta}^{\star})$ .

B.1 Joint values of parameters: $\beta_{q}=\beta_{t}$ and $L_{cs}=L_{\theta}$

In our first set of tests, we implement the basic variant of Algorithm 1 by retaining the configuration of parameters from our theoretical results. Since we only aim at validating the algorithm, we focus on the three instances that have less than 2,000 nodes: 1354pegase, 1888rte, and 1951pegase.

For the penalty parameters, we first choose the same value $\beta$ for both quadratic and trigonometric constraints as $\beta=\beta^{q}=\beta^{t}$ . We perform tests with three choices of $\beta$ , based on the number of lines in the test case: $\beta=|\mathcal{L}|/100,|\mathcal{L}|/10$ , or $|\mathcal{L}|$ . For the regularization parameter $L$ , we also choose the same value $L=L_{cs}=L_{\theta}$ . Note that all the experiments are initialized from the SOCP solution. We consider three different strategies when applying Algorithm 1:

•

Fixed strategy: we fix $L$ at the upper bound $L_{\psi}$ , which is computed in (38).

•

Bisection update: At each iteration $k$ of Algorithm 1, we choose $L_{\min}=1$ and initialize $L:=1$ . If we do not satisfy the line-search condition $F(V_{L}(\boldsymbol{x}^{k}))\leq\mathcal{Q}_{L}(V_{L}(\boldsymbol{x}^{k});\boldsymbol{x}^{k})$ , we apply a bisection in the interval $[L_{\min},\beta L_{\psi}]$ until we satisfy this condition.

•

Geometric- $\mu$ update: At each iteration $k$ , we also choose $L_{\min}=1$ and initialize $L:=1$ . We then update $L\leftarrow\mu\cdot L$ for $\mu>1$ in order to guarantee that $F(V_{L}(\boldsymbol{x}^{k}))\leq\mathcal{Q}_{L}(V_{L}(\boldsymbol{x}^{k});\boldsymbol{x}^{k})$ . In our experiments, we use $\mu:=2$ .

The results of our first test are presented in Table 3. The three columns for each strategy present the number of iterations, and the maximum violation of the quadratic (resp. trigonometric) constraint for each strategy on the three problem instances. In terms of number of iterations, we can observe that fixing $L$ results in a poor performance and tends to increase the number of iterations as well as the final violation of the constraints. Given this poor performance, we do not consider the Fixed strategy for the remainder of the numerical experiments. Also, using $\beta=|\mathcal{L}|$ produces satisfactory results, and increasing the value of $\beta$ supports convergence, therefore we choose this order of magnitude for the value of $\beta$ .

Bisection and Geometric-2 exhibit a similar behavior: when the algorithm converges to a feasible point, both choices achieve converge in tens of iterations, depending on the test case and the choice of $\beta$ . Nevertheless, when failing, the maximum violation of the quadratic constraint (MVQ) never reaches the desired tolerance of $1e^{-5}$ . This behavior might suggest that we do not penalize sufficiently the quadratic constraint. One should notice that the quadratic and trigonometric constraints are linked: once the angles are fixed, $\boldsymbol{c}$ and $\boldsymbol{s}$ , which are the variables that appear in the quadratic constraints, struggle to move from their current value in order to satisfy the quadratic constraints. This motivates us to consider two different values for $\beta$ and $L$ : $\beta_{q}$ for the quadratic constraints and $L_{cs}$ for the associated variables $\boldsymbol{c}$ and $\boldsymbol{s}$ ; $\beta_{t}$ for the trigonometric constraints and $L_{\theta}$ for the angle variables $\boldsymbol{\theta}$ .

B.2 Adaptation and enhancement of Algorithm 1 for AC-OPF

Adapting Algorithm 1 with individual choices of parameters: $\beta_{q}$ , $\beta_{t}$ and $L_{cs}$ , $L_{\theta}$ .

Based on our observations, we choose different values for $\beta_{q}$ and $\beta_{t}$ . Since we empirically observe that the quadratic constraints are harder to satisfy than the trigonometric constraints, we consider two alternatives: $\beta_{q}=2\beta_{t}$ and $\beta_{q}=5\beta_{t}$ . Furthermore, we set $L_{cs,\min}=\beta_{q}/\beta_{t}$ and $L_{\theta,\min}=1$ . Note that the choice of different values for these parameters does not affect the theoretical guarantees of our algorithm, as long as they satisfy our given conditions.

Also, since we have different values $L_{cs}$ and $L_{\theta}$ , we must adapt the condition under which $L_{cs}$ and $L_{\theta}$ are updated. From the theory, $L$ is updated (through the Bisection or Geometric strategy) if the following condition is not met:

[TABLE]

where $\boldsymbol{x}^{k+1}=\boldsymbol{x}^{k}+\boldsymbol{d}^{*}$ .

We adapt this condition to the specific type of constraint. Concretely:

•

If

[TABLE]

then update $L_{cs}$ and $L_{\theta}$ .

•

If (41) does not hold and

[TABLE]

then only update $L_{cs}$ .

Refining the $L$ updates.

We emphasize that, in Algorithm 1, each time that the values of $L_{cs}$ and/or $L_{\theta}$ are updated, the subproblem $\mathcal{P}_{sub}^{k}$ is resolved. Therefore, an effective strategy for updating these parameters can lead to significant improvements in computational time. We mitigate this heavy computational requirement by introducing resolution techniques that are guided by both our theoretical results and empirical observations. Concretely, we propose the following two improvements to the practical implementation of the algorithm, in order to limit the number of computationally expensive $L$ updates:

Keeping the value of $L$ from one iteration to another is a better strategy than having $L:=L_{\min}$ at the beginning of each iteration. The natural justification for this is that we expect steps to decrease along the iterations. 2. 2.

Checking conditions (41) and (42) can require a large number of $L$ updates. Instead, we propose a verification of whether the violation of the constraints is decreasing before checking (41) and (42), which is a less stringent requirement that still yields satisfactory results in terms of constraint violations. Concretely, at iteration $k$ , we compute the $\ell_{1}$ and $\ell_{\infty}$ norms of $\Psi_{q}(\boldsymbol{c}^{k},\boldsymbol{s}^{k})$ and $\Psi_{t}(\boldsymbol{c}^{k},\boldsymbol{s}^{k},\boldsymbol{\theta}^{k})$ . If these quantities decrease from $k-1$ to $k$ , we move to iteration $k+1$ .

By applying these two approaches, we obtain the results that are presented in Table 4. In Table 4, # It provides the number of GN iterations, # L-It provides the number of additional subproblems solved because of an $L$ -update (then # It+# L-It gives the total number of subproblems solved) and Obj records the objective value returned. From Table 4, we observe that this new strategy leads to convergence for all three test cases. Even if Bisection seems to require less iterations, it also provides higher objective values than Geometric-2. Also, we do not observe notable differences between applying a factor of 2 or 5 on $\beta$ , although a factor of $5$ decreases slightly the number of iterations. In our implementation, we employ the following settings: Geometric-2, $\beta_{t}=|\mathcal{L}|$ and $\beta_{q}=5\beta_{t}$ .

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[Aravena Solís, 2018] Aravena Solís, I. A. (2018). Analysis of renewable energy integration in transmission-constrained electricity markets using parallel computing . Ph D thesis, UCL-Université Catholique de Louvain.
2[Bolte and Pauwels, 2016] Bolte, J. and Pauwels, E. (2016). Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs. Math. Oper. Res. , 41(2):442–465.
3[Bolte et al., 2014] Bolte, J., Sabach, S., and Teboulle, M. (2014). Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming , 146(1-2):459–494.
4[Boob et al., 2019] Boob, D., Deng, Q., and Lan, G. (2019). Proximal point methods for optimization with nonconvex functional constraints. ar Xiv preprint ar Xiv:1908.02734 .
5[Boyd et al., 2011] Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning , 3(1):1–122.
6[Burke et al., 2002] Burke, J., Lewis, A., and Overton, M. (2002). Two numerical methods for optimizing matrix stability. Linear Algebra and Its Applications , 351-352:117–145.
7[Cain et al., 2012] Cain, M. B., O’neill, R. P., and Castillo, A. (2012). History of optimal power flow and formulations. Federal Energy Regulatory Commission , 1:1–36.
8[Carpentier, 1962] Carpentier, J. (1962). Contribution a l’etude du dispatching economique. Bulletin de la Societe Francaise des Electriciens , 3(1):431–447.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A Globally Convergent Penalty-Based Gauss-Newton

Abstract

1 Introduction

Statement of the problem.

A Gauss-Newton algorithm for constrained non-convex optimization.

The optimal power flow problem.

Contributions.

Content.

2 A Gauss-Newton Algorithm for Non-Convex Optimization

2.1 Exact penalty approach for constrained non-convex programming

Definition 2.1** ([Nocedal and Wright, 2006](Theorem 12.9)).**

Proposition 2.1**.**

2.2 Exact penalized formulation

Proposition 2.2** ([Nocedal and Wright, 2006], (Theorem 17.4.)).**

2.3 Global Gauss-Newton method

2.3.1 The derivation of the Gauss-Newton scheme and the full algorithm

2.3.2 Global convergence analysis

Lemma 2.1**.**

Proof.

Lemma 2.2**.**

Proof.

Theorem 2.1**.**

Proof.

2.4 Local convergence analysis

Theorem 2.2**.**

Proof.

3 A Case Study: The Optimal Power Flow Problem

3.1 Problem settings

Original AC-OPF.

Quadratic reformulation.

Lemma 3.1**.**

3.2 A practical implementation of the GN algorithm for AC-OPF

Stopping criteria.

SOCP relaxation for initialization.

Parameter tuning strategies.

Accelerating the computation of subproblem solutions, and warmstart.

3.3 Numerical experiments

3.3.1 Illustration on MATPOWER instances

3.3.2 The effect of warm-starting strategy

3.4 Optimization with bilinear matrix inequality constraint

4 Conclusion

Acknowledgments

Appendix

Appendix A The proof of Lemma 3.1

Proof.

Appendix B Investigation on parameter tuning strategies

Subproblem (8) in the context of AC-OPF.

B.1 Joint values of parameters: βq=βt\beta_{q}=\beta_{t}βq​=βt​ and Lcs=LθL_{cs}=L_{\theta}Lcs​=Lθ​

B.2 Adaptation and enhancement of Algorithm 1 for AC-OPF

Adapting Algorithm 1 with individual choices of parameters: βq\beta_{q}βq​, βt\beta_{t}βt​ and LcsL_{cs}Lcs​, LθL_{\theta}Lθ​.

Refining the LLL updates.

Definition 2.1 ([Nocedal and Wright, 2006](Theorem 12.9)).

Proposition 2.1.

Proposition 2.2 ([Nocedal and Wright, 2006], (Theorem 17.4.)).

Lemma 2.1.

Lemma 2.2.

Theorem 2.1.

Theorem 2.2.

Lemma 3.1.

B.1 Joint values of parameters: $\beta_{q}=\beta_{t}$ and $L_{cs}=L_{\theta}$

Adapting Algorithm 1 with individual choices of parameters: $\beta_{q}$ , $\beta_{t}$ and $L_{cs}$ , $L_{\theta}$ .

Refining the $L$ updates.