A new alternating direction trust region method based on conic model for   solving unconstrained optimization

Honglan Zhu; Qin Ni; Chuangyin Dang

arXiv:1812.01935·math.OC·December 6, 2018

A new alternating direction trust region method based on conic model for solving unconstrained optimization

Honglan Zhu, Qin Ni, Chuangyin Dang

PDF

Open Access

TL;DR

This paper introduces a novel alternating direction trust region method based on a conic model for unconstrained optimization, improving solvability and efficiency, especially for large-scale problems.

Contribution

It proposes a new conic model trust region subproblem solved via an alternating direction approach, overcoming previous difficulties and establishing global convergence.

Findings

01

Outperforms the dogleg method in numerical experiments

02

Effective for large-scale unconstrained optimization problems

03

Demonstrates better solvability of the subproblem

Abstract

In this paper, a new alternating direction trust region method based on conic model is used to solve unconstrained optimization problems. By use of the alternating direction method, the new conic model trust region subproblem is solved by two steps in two orthogonal directions. This new idea overcomes the shortcomings of conic model subproblem which is difficult to solve. Then the global convergence of the method under some reasonable conditions is established. Numerical experiment shows that this method may be better than the dogleg method to solve the subproblem, especially for large-scale problems.

Tables3

Table 1. Table 1: Test functions.

No.	Problem	No.	Problem
1	Cube	2	Penalty-I
3	Beale	4	Conic
5	Extended powell	6	Variably Dimensioned
7	Rosenbrock	8	Extended Trigonometric
9	Tridiagonal Exponential	10	Brent
11	Troesch	12	Cragg and Levy
13	Broyden Tridiagonal	14	Brown
15	Discrete Boundary Value	16	Extended Trigonometric

Table 2. Table 2: Results of ADCTR.

No.	$n$	Iter	$n f / n g$	$f_{k}$	$‖ g ‖$	CPU (s)
1	2	52	53/43	1.0377e-15	1.9477e-06	0.064681
2	2	10	11/11	9.0831e-06	8.9419e-06	0.048493
3	2	18	19/18	9.0379e-15	9.3925e-07	0.053525
4	2	16	17/13	1.1407e-12	2.1360e-06	0.050445
5	4	41	42/34	4.8648e-09	4.5887e-06	0.062011
6	4	32	33/29	2.3856e-14	3.0965e-07	0.066287
7	2	50	51/49	1.5486e-14	5.4101e-06	0.064227
8	4	47	48/34	7.9158e-15	4.1153e-07	0.076865
9	4	7	8/8	8.1577e-12	4.5905e-06	0.058505
10	4	81	82/58	5.8024e-18	4.6604e-07	0.089702
11	4	59	60/51	1.0955e-13	2.7230e-06	0.077290
12	4	48	49/43	1.1247e-08	5.2578e-06	0.068215
13	4	35	36/19	1.4498e-11	5.0442e-06	0.063276
14	2	91	92/52	0.1998e-06	2.5916e-07	0.089294
15	4	23	24/15	2.0042e-12	8.2898e-06	0.061544
16	4	14	15/15	3.0282e-04	4.9068e-06	0.048488

Table 3. Table 3: Numerical results of DCTR and ADCTR

Solver		DCTR				ADCTR
No.	$n$	Iter	$n f / n g$	$‖ g ‖$	CPU (s)	Iter	$n f / n g$	$‖ g ‖$	CPU (s)
1	20	746	747/517	9.2593e-07	0.116584	100	101/92	2.1475e-06	0.079679
	200	2387	2388/2023	2.6984e-08	2.377265	82	83/59	4.6090e-06	0.308822
	1000	*	/	*	*	74	75/56	1.4596e-06	19.36706
2	200	76	77/53	3.1715e-06	0.138719	79	80/53	4.1422e-06	0.137110
	500	96	97/62	6.5292e-06	1.118043	78	79/54	4.6576e-06	1.270450
	1000	82	83/57	5.5181e-06	6.454441	86	87/57	8.8824e-06	8.026574
3	2	20	21/19	4.3711e-07	0.042443	18	19/18	9.3925e-07	0.053525
	20	24	25/19	2.4476e-07	0.042636	24	25/25	4.9198e-06	0.056501
	200	27	28/24	3.6411e-08	0.077384	26	27/27	2.0468e-07	0.145411
	2000	29	30/25	6.1805e-06	15.70090	35	36/28	8.8657e-06	54.47877
4	20	15	16/14	5.7802e-07	0.039310	16	17/13	4.8378e-09	0.055520
	200	16	17/12	3.7354e-06	0.051864	19	20/18	3.4444e-07	0.094543
	2000	18	19/19	3.9029e-07	13.50780	19	20/17	2.5377e-06	17.65469
5	40	121	122/104	8.7449e-06	0.064345	48	49/43	6.0181e-06	0.076145
	1000	121	122/116	2.3550e-06	14.01567	92	93/77	8.1189e-06	18.098231
	2000	121	122/116	2.6445e-06	106.0106	69	70/60	6.2005e-06	97.24934
6	40	120	121/75	6.0183e-06	0.125199	145	146/116	7.2779e-06	0.115281
6	400	*	/	*	*	1124	1125/774	3.2877e-06	11.33673
7	20	90	91/69	1.3138e-06	0.106181	83	84/54	1.0093e-06	0.082832
	200	517	518/392	4.5464e-06	0.926231	61	62/52	2.0797e-06	0.242663
	2000	326	327/294	2.2237e-06	218.2702	71	72/54	7.8519e-07	112.6556
8	4	46	47/38	9.2635e-06	0.054172	47	48/34	4.1153e-07	0.076865
8	40	*	/	*	*	354	355/265	9.1963e-06	0.147167
9	40	6	7/7	1.1958e-06	0.054117	6	7/7	1.8900e-06	0.060258
	400	6	7/7	1.6354e-07	0.111561	6	7/7	2.3374e-07	0.133484
	4000	11	12/12	8.4467e-07	40.15594	11	12/12	8.9545e-07	47.93941
10	4	377	378/298	8.2689e-06	0.175454	81	82/58	4.6604e-07	0.089702
10	40	*	/	*	*	1260	1261/910	5.7484e-06	0.391677
11	4	70	71/37	2.9831e-06	0.073789	59	60/51	2.7230e-06	0.077290
	40	192	193/133	4.2981e-06	0.108448	132	133/122	3.1390e-06	0.116485
	500	*	/	*	*	1119	1120/1023	9.3082e-06	21.02191
12	4	43	44/41	4.4263e-06	0.062761	48	49/43	5.2578e-06	0.068215
	40	1977	1978/1315	8.1245e-06	0.369235	190	191/146	9.5513e-06	0.129097
	400	*	/	*	*	351	352/252	8.4470e-06	4.848008
13	4	35	36/16	8.9785e-06	0.053999	35	36/19	5.0442e-06	0.063276
	40	359	360/263	9.3216e-06	0.140429	47	48/29	7.5584e-06	0.084719
	400	1996	1997/1400	9.7095e-06	14.38582	55	56/34	9.2260e-06	0.746511
	1000	*	/	*	*	52	53/36	9.5547e-06	10.46032
14	2	98	99/59	5.6830e-06	0.058219	91	92/52	2.5916e-07	0.089294
	20	164	165/87	6.2362e-06	0.076377	125	126/98	9.9306e-06	0.094535
	200	*	/	*	*	209	210/161	9.3905e-06	0.656179
15	4	27	28/16	4.2390e-07	0.063120	23	24/15	8.2898e-06	0.061544
	400	33	34/11	8.4956e-06	0.162408	35	36/15	7.7218e-06	0.202917
	1000	21	22/2	9.0840e-06	0.101637	21	22/2	9.0840e-06	0.160027
	4000	25	26/2	5.6751e-07	0.970886	25	26/2	5.6751e-07	2.331821
16	4	19	20/16	1.4241e-06	0.051494	14	15/15	4.9068e-06	0.048488
	40	518	519/329	7.5993e-06	0.117250	63	64/42	6.2298e-06	0.057708
	400	*	/	*	*	60	61/48	4.6296e-06	0.571713

Equations252

x \in R^{n} min f (x),

x \in R^{n} min f (x),

s \in R^{n} min ϱ_{k} (s) = g_{k}^{T} s + \frac{1}{2} s^{T} B_{k} s,

s \in R^{n} min ϱ_{k} (s) = g_{k}^{T} s + \frac{1}{2} s^{T} B_{k} s,

\mbox s . t . ∥ s ∥ \leq Δ_{k},

d = ∥ s_{k}^{N} - s_{k}^{c} ∥^{2}, e = (s_{k}^{N} - s_{k}^{c})^{T} s_{k}^{c}, f = ∥ s_{k}^{c} ∥^{2} - Δ_{k}^{2},

d = ∥ s_{k}^{N} - s_{k}^{c} ∥^{2}, e = (s_{k}^{N} - s_{k}^{c})^{T} s_{k}^{c}, f = ∥ s_{k}^{c} ∥^{2} - Δ_{k}^{2},

s \in R^{n} min ϕ_{k} (s) = \frac{g _{k}^{T} s}{1 - a _{k}^{T} s} + \frac{s ^{T} B _{k} s}{2 ( 1 - a _{k}^{T} s ) ^{2}},

s \in R^{n} min ϕ_{k} (s) = \frac{g _{k}^{T} s}{1 - a _{k}^{T} s} + \frac{s ^{T} B _{k} s}{2 ( 1 - a _{k}^{T} s ) ^{2}},

\mbox s . t . ∥ s ∥ \leq Δ_{k}, 1 - a_{k}^{T} s > 0,

s \in R^{n} min ϕ_{k} (s) = \frac{g _{k}^{T} s}{1 - a _{k}^{T} s} + \frac{s ^{T} B _{k} s}{2 ( 1 - a _{k}^{T} s ) ^{2}},

s \in R^{n} min ϕ_{k} (s) = \frac{g _{k}^{T} s}{1 - a _{k}^{T} s} + \frac{s ^{T} B _{k} s}{2 ( 1 - a _{k}^{T} s ) ^{2}},

\mbox s . t . ∥ s ∥ \leq Δ_{k}, ∣1 - a_{k}^{T} s ∣ \geq ε_{0},

s_{k}^{N} = \frac{- B _{k}^{- 1} g _{k}}{1 - a _{k}^{T} B _{k}^{- 1} g _{k}},

s_{k}^{N} = \frac{- B _{k}^{- 1} g _{k}}{1 - a _{k}^{T} B _{k}^{- 1} g _{k}},

s_{k}^{c} = \frac{- g _{k}^{T} g _{k}}{g _{k}^{T} B _{k} g _{k} - a _{k}^{T} g _{k} g _{k}^{T} g _{k}} g_{.}

s = τ a + y,

s = τ a + y,

min ρ (τ) = \frac{τ a ^{T} g}{1 - τ a ^{T} a} + \frac{τ ^{2} a ^{T} B a}{2 ( 1 - τ a ^{T} a ) ^{2}},

min ρ (τ) = \frac{τ a ^{T} g}{1 - τ a ^{T} a} + \frac{τ ^{2} a ^{T} B a}{2 ( 1 - τ a ^{T} a ) ^{2}},

\mbox s . t . τ \in Ω,

τ_{Δ} = \frac{Δ}{∥ a ∥}, τ_{d} = \frac{1 - ε _{0}}{∥ a ∥ ^{2}}, τ_{m} = \frac{1}{∥ a ∥ ^{2}}, τ_{u} = \frac{1 + ε _{0}}{∥ a ∥ ^{2}} .

τ_{Δ} = \frac{Δ}{∥ a ∥}, τ_{d} = \frac{1 - ε _{0}}{∥ a ∥ ^{2}}, τ_{m} = \frac{1}{∥ a ∥ ^{2}}, τ_{u} = \frac{1 + ε _{0}}{∥ a ∥ ^{2}} .

Ω = {τ ∣ ∣ τ ∣ \leq τ_{Δ}} \cap {τ ∣ τ \leq τ_{d} or τ \geq τ_{u}} .

Ω = {τ ∣ ∣ τ ∣ \leq τ_{Δ}} \cap {τ ∣ τ \leq τ_{d} or τ \geq τ_{u}} .

\displaystyle(\text{P1})\left\{\begin{array}[]{l}\min\ \ \ \rho(\tau),\\ \mbox{s.t.}\ \ \ \ \ \tau\in[-\tau_{\Delta},\tau_{\Delta}].\end{array}\right.

\displaystyle(\text{P1})\left\{\begin{array}[]{l}\min\ \ \ \rho(\tau),\\ \mbox{s.t.}\ \ \ \ \ \tau\in[-\tau_{\Delta},\tau_{\Delta}].\end{array}\right.

\displaystyle(\text{P2})\left\{\begin{array}[]{l}\min\ \ \ \rho(\tau),\\ \mbox{s.t.}\ \ \ \ \ \tau\in[-\tau_{\Delta},\tau_{d}].\end{array}\right.

\displaystyle(\text{P2})\left\{\begin{array}[]{l}\min\ \ \ \rho(\tau),\\ \mbox{s.t.}\ \ \ \ \ \tau\in[-\tau_{\Delta},\tau_{d}].\end{array}\right.

\displaystyle(\text{P3})\left\{\begin{array}[]{l}\min\ \ \ \rho(\tau),\\ \mbox{s.t.}\ \ \ \ \ \tau\in[-\tau_{\Delta},\tau_{d}]\cup[\tau_{u},\tau_{\Delta}].\end{array}\right.

\displaystyle(\text{P3})\left\{\begin{array}[]{l}\min\ \ \ \rho(\tau),\\ \mbox{s.t.}\ \ \ \ \ \tau\in[-\tau_{\Delta},\tau_{d}]\cup[\tau_{u},\tau_{\Delta}].\end{array}\right.

ρ^{'} (τ) = \frac{a _{τ} τ + a ^{T} g}{- ∥ a ∥ ^{6} ( τ - τ _{m} ) ^{3}},

ρ^{'} (τ) = \frac{a _{τ} τ + a ^{T} g}{- ∥ a ∥ ^{6} ( τ - τ _{m} ) ^{3}},

a_{τ} = a^{T} B a - a^{T} a a^{T} g .

a_{τ} = a^{T} B a - a^{T} a a^{T} g .

τ_{c p} = \frac{- a ^{T} g}{a _{τ}} .

τ_{c p} = \frac{- a ^{T} g}{a _{τ}} .

τ_{c p} - τ_{m} = \frac{a ^{T} B a}{- a _{τ} ∥ a ∥ ^{2}} .

τ_{c p} - τ_{m} = \frac{a ^{T} B a}{- a _{τ} ∥ a ∥ ^{2}} .

τ_{*} = 0.

τ_{*} = 0.

ρ (τ) = \frac{τ ^{2} a ^{T} B a}{2 ( 1 - τ a ^{T} a ) ^{2}} \geq 0.

ρ (τ) = \frac{τ ^{2} a ^{T} B a}{2 ( 1 - τ a ^{T} a ) ^{2}} \geq 0.

\tau_{\ast}=\left\{\begin{array}[]{l}-\tau_{\Delta},\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \mbox{if}\ \ a_{\tau}\leq 0,\\ \max\{-\tau_{\Delta},\tau_{cp}\},\ \ \mbox{if}\ \ a_{\tau}>0,a^{T}g>0,\\ \min\{\tau_{cp},\tau_{\Delta}\},\ \ \ \ \ \mbox{if}\ \ a_{\tau}>0,a^{T}g<0.\end{array}\right.

\tau_{\ast}=\left\{\begin{array}[]{l}-\tau_{\Delta},\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \mbox{if}\ \ a_{\tau}\leq 0,\\ \max\{-\tau_{\Delta},\tau_{cp}\},\ \ \mbox{if}\ \ a_{\tau}>0,a^{T}g>0,\\ \min\{\tau_{cp},\tau_{\Delta}\},\ \ \ \ \ \mbox{if}\ \ a_{\tau}>0,a^{T}g<0.\end{array}\right.

\tau_{\ast}=\left\{\begin{array}[]{l}-\tau_{\Delta},\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \mbox{if}\ \ a_{\tau}\leq 0,\\ \max\{-\tau_{\Delta},\tau_{cp}\},\ \,\mbox{if}\ \ a_{\tau}>0,a^{T}g>0,\\ \min\{\tau_{cp},\tau_{d}\},\ \ \ \ \ \mbox{if}\ \ a_{\tau}>0,a^{T}g<0.\end{array}\right.

\tau_{\ast}=\left\{\begin{array}[]{l}-\tau_{\Delta},\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \mbox{if}\ \ a_{\tau}\leq 0,\\ \max\{-\tau_{\Delta},\tau_{cp}\},\ \,\mbox{if}\ \ a_{\tau}>0,a^{T}g>0,\\ \min\{\tau_{cp},\tau_{d}\},\ \ \ \ \ \mbox{if}\ \ a_{\tau}>0,a^{T}g<0.\end{array}\right.

\tau_{\ast}=\left\{\begin{array}[]{l}\tau_{u},\ \ \ \mbox{if}\ \ \tau_{m}<\tau_{cp}\leq\tau_{u},\\ \tau_{cp},\ \mbox{if}\ \ \tau_{u}<\tau_{cp}<\tau_{\Delta},\\ \tilde{\tau}_{\Delta},\ \ \mbox{if}\ \ \tau_{cp}\geq\tau_{\Delta},\end{array}\right.

\tau_{\ast}=\left\{\begin{array}[]{l}\tau_{u},\ \ \ \mbox{if}\ \ \tau_{m}<\tau_{cp}\leq\tau_{u},\\ \tau_{cp},\ \mbox{if}\ \ \tau_{u}<\tau_{cp}<\tau_{\Delta},\\ \tilde{\tau}_{\Delta},\ \ \mbox{if}\ \ \tau_{cp}\geq\tau_{\Delta},\end{array}\right.

\tilde{τ}_{Δ} = \mbox a r g min {ρ (- τ_{Δ}), ρ (τ_{Δ})} .

\tilde{τ}_{Δ} = \mbox a r g min {ρ (- τ_{Δ}), ρ (τ_{Δ})} .

Ω = [- τ_{Δ}, τ_{d}] \cup [τ_{u}, τ_{Δ}],

Ω = [- τ_{Δ}, τ_{d}] \cup [τ_{u}, τ_{Δ}],

\tau_{\ast}=\left\{\begin{array}[]{l}\mbox{arg\>min}\{\rho(-\tau_{\Delta}),\rho(\tau_{u})\},\ \ \mbox{if}\ \ \tau_{m}<\tau_{cp}\leq\tau_{u},\\ \mbox{arg\>min}\{\rho(-\tau_{\Delta}),\rho(\tau_{cp})\},\ \mbox{if}\ \ \tau_{u}<\tau_{cp}<\tau_{\Delta},\\ \mbox{arg\>min}\{\rho(-\tau_{\Delta}),\rho(\tau_{\Delta})\},\ \ \mbox{if}\ \ \tau_{cp}\geq\tau_{\Delta}.\end{array}\right.

\tau_{\ast}=\left\{\begin{array}[]{l}\mbox{arg\>min}\{\rho(-\tau_{\Delta}),\rho(\tau_{u})\},\ \ \mbox{if}\ \ \tau_{m}<\tau_{cp}\leq\tau_{u},\\ \mbox{arg\>min}\{\rho(-\tau_{\Delta}),\rho(\tau_{cp})\},\ \mbox{if}\ \ \tau_{u}<\tau_{cp}<\tau_{\Delta},\\ \mbox{arg\>min}\{\rho(-\tau_{\Delta}),\rho(\tau_{\Delta})\},\ \ \mbox{if}\ \ \tau_{cp}\geq\tau_{\Delta}.\end{array}\right.

ρ (τ_{u}) - ρ (- τ_{Δ}) = \frac{Δ ^{2} ∥ a ∥ ^{2} a _{Δ} + 2Δ∥ a ∥ b _{Δ} + c _{Δ}}{2 ε _{0}^{2} ∥ a ∥ ^{4} ( 1 + Δ∥ a ∥ ) ^{2}},

ρ (τ_{u}) - ρ (- τ_{Δ}) = \frac{Δ ^{2} ∥ a ∥ ^{2} a _{Δ} + 2Δ∥ a ∥ b _{Δ} + c _{Δ}}{2 ε _{0}^{2} ∥ a ∥ ^{4} ( 1 + Δ∥ a ∥ ) ^{2}},

a_{Δ} = (1 + 2 ε_{0}) a^{T} B a - 2 ε_{0} ∥ a ∥^{2} a^{T} g,

a_{Δ} = (1 + 2 ε_{0}) a^{T} B a - 2 ε_{0} ∥ a ∥^{2} a^{T} g,

b_{Δ} = (1 + ε_{0})^{2} a^{T} B a - (2 + ε_{0}) ε_{0} ∥ a ∥^{2} a^{T} g,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optimization Algorithms Research · Sparse and Compressive Sensing Techniques · Optimization and Variational Analysis

Full text

A new alternating direction trust region method based on conic model for solving unconstrained optimization

Honglan Zhu

[email protected]

Qin Ni

[email protected]

Chuangyin Dang

[email protected]

Department of Mathematics, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, People’s Republic of China.

Business School, Huaiyin Institute of Technology, Huaian 223003, People’s Republic of China.

Department of Systems Engineering & Engineering Management, City University of Hong Kong, Kowloon, Hong Kong SAR.

Abstract

In this paper, a new alternating direction trust region method based on conic model is used to solve unconstrained optimization problems. By use of the alternating direction method, the new conic model trust region subproblem is solved by two steps in two orthogonal directions. This new idea overcomes the shortcomings of conic model subproblem which is difficult to solve. Then the global convergence of the method under some reasonable conditions is established. Numerical experiment shows that this method may be better than the dogleg method to solve the subproblem, especially for large-scale problems.

keywords:

Unconstrained optimization , conic model , trust region method , alternating direction method , global convergence

††journal: Journal of LaTeX Templates

1 Introduction

In this paper, we consider the unconstrained optimization problem

[TABLE]

where $f(x)$ is continuously differentiable. The problem (1.1) have been studied by many researchers, including Han [1], Powell [2], Yuan and Sun [3], Powell and Yuan [4], etc. There are many methods to solve problem (1.1), and trust region method is a very effective method (see [4, 5, 6, 7, 8, 9]). In addition, the book of Conn, Gould and Toint [10] is an excellent and comprehensive one on trust region methods. Most optimization theory is based on the quadratic model and uses the quadratic model to approximate $f(x)$ . That is, at the $k$ th iteration, the following subproblem:

[TABLE]

is solved to obtain a search direction $s_{k}$ , where $x_{k}$ is the current iterate point, $g_{k}=\nabla f(x_{k})$ , $B_{k}$ is symmetric and an approximation to the Hessian of $f(x)$ , $\|\cdot\|$ refers to the Euclidean norm, $\Delta_{k}$ is the trust region radius at the $k$ th iteration.

There are many methods can be used to solve the subproblem (1.2)-(1.3). The simple, low cost and effective methods are dogleg methods, such as Powell’s single dogleg method [11] and Dennis and Mei’s double dogleg method [12]. Then there are other scholars have studied the dogleg method [13, 14, 15]. Now, we recall the simple dogleg algorithm for solving trust region subproblem with the quadratic model as following algorithm.

Algorithm 1.1

Step 0. Input the data of the $k$ th iteration i.e., $g_{k},B_{k}$ and $\Delta_{k}$ .

Step 1. Compute $s_{k}^{\textrm{N}}=-B_{k}^{-1}g_{k}$ . If $\|s_{k}^{\textrm{N}}\|\leq{\Delta}_{k}$ , then $s_{\ast}=s_{k}^{\textrm{N}}$ , and stop.

Step 2. Compute $s_{k}^{c}=-\frac{g_{k}^{T}g_{k}}{g_{k}^{T}B_{k}g_{k}}g_{k}$ . If $\|s_{k}^{c}\|\geq{\Delta}_{k}$ , then $s_{\ast}=-\frac{{\Delta}_{k}g_{k}}{\|g_{k}\|}$ , and stop. Otherwise, go to Step 3.

Step 3. Compute

[TABLE]

then $s_{\ast}=s_{k}^{c}+\lambda(s_{k}^{\textrm{N}}-s_{k}^{c})$ , where $\lambda=\frac{-e+\sqrt{e^{2}-df}}{d}$ .

We note that the solution of the subproblem obtained by dogleg methods is only an approximate solution of (1.2)-(1.3). Moreover, practice experience shows that the quadratic model is not always effective. If the objective function possesses high non-linear property and the iterative point is far away from the minimum, the quadratic model could not approximate the original problem very well, which may lead to iteration proceed slowly.

In 1980, Davidon [16] proposed the conic model for solving unconstrained optimization. It is an alternative model to substitute the quadratic model. And it has attracted wide attention of many authors in various areas [20, 21, 22, 17, 18, 23, 19]. A typical trust-region subproblem with conic model was first proposed by Di and Sun in [24] as following.

[TABLE]

where horizon vector $a_{k}\in R^{n}$ , and $B_{k}$ is symmetric and positive semidefinite. In [25], Ni proposed a new trust region subproblem and gave the optimality conditions for the trust region subproblems of a conic model. That is, at the $k$ th iteration, the trial step $s_{k}$ is computed by solving the following conic model trust region subproblem

[TABLE]

where $\varepsilon_{0}$ ( $0<\varepsilon_{0}<1$ ) is a sufficiently small positive number. The subproblem (1.7)-(1.8) considered more comprehensive than (1.5)-(1.6), and will not miss the solution of the original problem (1.1).

The research demonstrated that the conic model is superior to quadratic model to some extent, in particular, for those class of objective functions with highly vibrating; in addition, the conic model can supply enough freedom to make best use of both information of gradients and function values in iterate points. In view of this good properties of conic model, we will continue to study it.

It is noteworthy that the simple dogleg algorithm for solving trust region subproblem based on the conic model (DCTR) is similar to the above Algorithm 1.1, where

[TABLE]

However, the calculation of DCTR is much more complicated (see [26, 27, 28])

In order to find a simpler method and which is more suitable for the unique structure of the conic model, we considered to using the alternating directions method for solving the conic model subproblem. Alternating directions method (ADM) could date back to [29]. It has been well studied in the linearly constrained convex programming problems. Because of its significant efficiency and easy implementation, ADM has attracted wide attention of many authors in various areas, see [30, 31, 32, 33, 34, 35].

In this paper, we combine the subproblem (1.7)-(1.8) with alternating direction search method to propose a new method for solving the conic trust region subproblem. The rest of this paper is organized as follows. In the next section, the motivation and description of the simple alternating direction search algorithm are presented. In Section 3, we give the quasi-Newton method based on the conic model for solving unconstrained optimization problems and prove its global convergence properties. The numerical results in Section 4 indicate that the algorithm is efficient and robust.

2 A simple alternating direction search method

The conic model $\phi_{k}(s)$ in the subproblem (1.7)-(1.8) has one more parameter $a_{k}$ than $\varrho_{k}(s)$ , so $\phi_{k}(s)$ has more freedom which can take into account the information concerning the function value in the previous iteration which is useful for algorithms. Furthermore, the conic model possesses richer interpolation information and can satisfy four interpolation conditions of the function values and the gradient values at the current and the previous points. Using these rich interpolation information may improve the performance of the algorithms. Generally, the choice of the parameters $a_{k}$ is a descent direction, such as $g(x_{k-1})$ , $g(x_{k})$ or $s_{k-1}$ (see [16, 17, 18, 27, 26]).

In view of the unique importance of the parameters $a_{k}$ , we consider the following alternating direction search method to solve the subproblem (1.7)-(1.8). The new method is divided into two steps. First, we search along the direction parallel to $a_{k}$ . And then search along the direction $y_{k}$ which is perpendicular to $a_{k}$ . For convenience, we omit the index $k$ of $a_{k},g_{k}$ and $B_{k}$ in this section.

In this paper, we assume that $a\neq 0$ and $B$ is positive (abbreviated as $B>0$ ).

Let

[TABLE]

where $\tau\in R,y\in R^{n}$ and $a^{T}y=0$ . Then, the solving process of subproblem (1.7)-(1.8) is divided into the following two stages.

In the first stage, we set $y=0$ and then $s=\tau a$ . Substituting it into (1.7)-(1.8), we have

[TABLE]

where $\Omega=\{\tau\ |\ |\tau|\|a\|\leq\Delta,|1-\tau\|a\|^{2}|\geq\varepsilon_{0}\}$ .

For the purpose of clarity, we denote

[TABLE]

Then,

[TABLE]

In the following, we consider three different cases of (2.2)-(2.3):

(1) If $\Delta\|a\|\leq 1-\varepsilon_{0}$ , then $\tau_{\Delta}\leq\tau_{d}$ and (2.2)-(2.3) becomes

[TABLE]

(2) If $|1-\Delta\|a\|\,|<\varepsilon_{0}$ , then $\tau_{d}<\tau_{\Delta}<\tau_{u}$ and (2.2)-(2.3) becomes

[TABLE]

(3) If $\Delta\|a\|\geq 1+\varepsilon_{0}$ , then $\tau_{u}\leq\tau_{\Delta}$ and (2.2)-(2.3) becomes

[TABLE]

Now, we discuss the stationary points of $\rho(\tau)$ . By the direct computation, we have that the derivative of $\rho(\tau)$ is

[TABLE]

where

[TABLE]

From (2.4), we know that $0<\tau_{d}<\tau_{m}<\tau_{u}$ and then from (2.5) $\tau_{m}\not\in\Omega$ . Therefore, if $a_{\tau}\neq 0$ then $\rho(\tau)$ has only one stationary point

[TABLE]

Lemma 2.1.

(1) If $a_{\tau}<0$ then $\tau_{m}<\tau_{cp}$ and $\rho(\tau)$ is monotonically decreasing in the in the trust region $(\tau_{m},\tau_{cp})$ ; $\rho(\tau)$ is monotonically increasing for $\tau<\tau_{m}$ and $\tau>\tau_{cp}$ .

(2) If $a_{\tau}=0$ , then $a^{T}g>0$ and $\rho(\tau)$ is monotonically increasing for $\tau<\tau_{m}$ ; $\rho(\tau)$ is monotonically decreasing for $\tau>\tau_{m}$ .

(3) If $a_{\tau}>0$ , then $\tau_{cp}<\tau_{m}$ and $\rho(\tau)$ is monotonically increasing in the trust region $(\tau_{cp},\tau_{m})$ ; $\rho(\tau)$ is monotonically decreasing for $\tau<\tau_{cp}$ and $\tau>\tau_{m}$ .

Proof.

From (2.4) and (2.17), we know that if $a_{\tau}\neq 0$ then

[TABLE]

Then, since $B\succ 0$ , combining with (2.15) we can obtain that the lemma obviously holds. ∎

Theorem 2.1.

If $a^{T}g=0$ then the optimal solution of the subproblem (P1), (P2) and (P3) is

[TABLE]

Proof.

If $a^{T}g=0$ then from (2.2) we have

[TABLE]

Hence, the theorem holds. ∎

Theorem 2.2.

If $a^{T}g\neq 0$ , then the optimal solution of the subproblem (P1) is

[TABLE]

Proof.

For the subproblem (P1), we know that $\Omega=[-\tau_{\Delta},\tau_{\Delta}]$ where $\tau_{\Delta}\leq\tau_{d}<\tau_{m}$ .

(1) If $a_{\tau}\leq 0$ , then from Lemma 2.1 (1)(2) we can easily obtain $\tau_{\ast}=-\tau_{\Delta}$ .

(2) If $a_{\tau}>0,a^{T}g>0$ , then $\tau_{cp}<0$ . From Lemma 2.1 (3), we can obtain that if $\tau_{cp}\leq-\tau_{\Delta}$ then $\tau_{\ast}=-\tau_{\Delta}$ ; If $-\tau_{\Delta}<\tau_{cp}<0$ , then $\tau_{\ast}=\tau_{cp}$ . Therefore, $\tau_{\ast}=\max\{-\tau_{\Delta},\tau_{cp}\}$ .

(3) If $a_{\tau}>0,a^{T}g<0$ , then $\tau_{cp}>0$ . From Lemma 2.1 (3), we can obtain that if $0<\tau_{cp}\leq\tau_{\Delta}$ then $\tau_{\ast}=\tau_{cp}$ ; If $\tau_{\Delta}<\tau_{cp}<\tau_{m}$ , then $\tau_{\ast}=\tau_{\Delta}$ . Therefore, $\tau_{\ast}=\min\{\tau_{cp},\tau_{\Delta}\}$ . ∎

Theorem 2.3.

If $a^{T}g\neq 0$ , then the optimal solution of the subproblem (P2) is

[TABLE]

Proof.

The proof process is similar to the above Theorem 2.2, so we omitted it. ∎

Theorem 2.4.

If $a_{\tau}<0$ , then $a^{T}g>0$ , $\tau_{m}<\tau_{cp}$ and the optimal solution of the subproblem (P3) is

[TABLE]

where

[TABLE]

Proof.

For the subproblem (P3), we know that

[TABLE]

where $\tau_{d}<\tau_{m}<\tau_{u}$ . If $a_{\tau}<0$ , then $a^{T}g>0$ . And from Lemma 2.1 (1) we can easily obtain that $\tau_{m}<\tau_{cp}$ and

[TABLE]

(1) If $\tau_{m}<\tau_{cp}\leq\tau_{u}$ , then from (2.2) we have

[TABLE]

where

[TABLE]

Because $\tau_{cp}\leq\tau_{u}$ , then from (2.4) and (2.17) we have

[TABLE]

And then

[TABLE]

Combining with (2.27), then

[TABLE]

Hence, $\tau_{\ast}=\tau_{u}$ .

(2) If $\tau_{u}<\tau_{cp}<\tau_{\Delta}$ , then from (2.2) we have

[TABLE]

Because $a_{\tau}<0,a^{T}g>0$ , then

[TABLE]

Therefore, $\tau_{\ast}=\tau_{cp}$ . The theorem is proved. ∎

Theorem 2.5.

If $a_{\tau}\geq 0$ and $a^{T}g\neq 0$ , then the optimal solution of the subproblem (P3) is

[TABLE]

Proof.

(1) If $a_{\tau}=0$ then $a^{T}g>0$ . Combining (2.25) and Lemma 2.1 (2), we know that

[TABLE]

However, by calculation we have

[TABLE]

For $a_{\tau}=0$ and $a^{T}g>0$ , then

[TABLE]

Hence, $\tau_{\ast}=-\tau_{\Delta}$ and (2.36) holds.

(2) If $a_{\tau}>0,a^{T}g>0$ then $\tau_{cp}<0$ . Combining (2.25) and Lemma 2.1 (3), we know that the optimal solution of the subproblem (P3) is

[TABLE]

For $a_{\tau}>0,a^{T}g>0$ , then from (2.39) we note that

[TABLE]

If $-\tau_{\Delta}<\tau_{cp}<0$ , then from Lemma 2.1 (3) we know that

[TABLE]

Thus,

[TABLE]

Then, (2.36) holds.

(3) If $a_{\tau}>0$ , $a^{T}g<0$ , then from (2.17) and (2.18) we can get $0<\tau_{cp}<\tau_{m}$ . Combining (2.25) and Lemma 2.1 (3), we know that the optimal solution of the subproblem (P3) is

[TABLE]

For the subproblem (P3), we note that $1-\Delta\|a\|\leq-\varepsilon_{0}$ . Because of $a^{T}g<0$ , then

[TABLE]

However, from $\rho(0)=0$ and Lemma 2.1 (3) we can obtain that if $0<\tau_{cp}<\tau_{d}$ then $\rho(\tau_{cp})<0$ ; If $\tau_{d}\leq\tau_{cp}<\tau_{m}$ then $\rho(\tau_{d})<0$ holds too. Therefore, it follows that

[TABLE]

Then, (2.36) holds too and the theorem is proved. ∎

If $\tau_{\ast}=\tau_{\Delta}$ , then from (2.4) we know that $\|\tau_{\ast}a\|=\Delta$ . Therefore, for this case we set $s_{\ast}=\tau_{\ast}a$ and exit the calculation of subproblem. Otherwise, we know that $\tau_{\ast}a$ is inside the trust region. Then, we should carry out the calculation of the second stage below.

We set $s=\tau_{\ast}a+y$ and substitute it into $\phi_{k}(s)$ . And then the subproblem (1.7)-(1.8) becomes

[TABLE]

where

[TABLE]

In order to remove the equality constraint in (2.47), we use the null space technique. That is, for $a\neq 0$ then there exist $n-1$ mutually orthogonal unit vectors $q,q,\cdots,q_{n-1}$ orthogonal to the parameter vector $a$ . Set $Q=[q,q,\cdots,q_{n-1}]$ and $y=Qu$ , where $u\in R^{n-1}$ . Then (2.46)-(2.47) can be simplified as following subproblem

[TABLE]

where

[TABLE]

Set $g_{k}=\tilde{g}$ , $B_{k}=\tilde{B}$ and $\Delta_{k}=\tilde{\Delta}$ . By Algorithm 1.1, we can obtain the solution $u_{\ast}$ of the subproblem (2.49)-(2.50). Then $y_{\ast}=Qu_{\ast}$ and $s_{\ast}=\tau_{\ast}a+y_{\ast}$ . Thus, the subproblem (1.7)-(1.8) is solved approximately.

Now we could give the alternating direction search method for solving the conic trust region subproblem (1.7)-(1.8) as following.

Algorithm 2.1

Given $\varepsilon_{0},a,g,B$ and $\Delta$ .

Step 1. If $a^{T}g=0$ , then $\tau_{\ast}=0$ . Set $a=0$ and use Algorithm 1.1 to get $s_{k}$ , stop.

Step 2. Compute $\tau_{cp},\tau_{d},\tau_{u},\tau_{\Delta}$ and $a_{\tau}$ by (2.4), (2.16) and (2.17).

Step 3. Compute $1-\Delta\|a\|$ .

Step 4. Solve the subproblem (2.2)-(2.3).

Step 4.1. If $1-\Delta\|a\|\geq\varepsilon_{0}$ , then calculate $\tau_{\ast}$ by (2.21); If $|1-\Delta\|a\|\,|<\varepsilon_{0}$ , then calculate $\tau_{\ast}$ by (2.22);

Otherwise, go to step 4.2.

Step 4.2. If $a_{\tau}<0$ then calculate $\tau_{\ast}$ by (2.23); If $a_{\tau}\geq 0$ then calculate $\tau_{\ast}$ by (2.36);

Step 5. If $\tau_{\ast}=\pm\tau_{\Delta}$ , then $s_{k}=\pm\tau_{\Delta}a$ , and stop. Otherwise, compute $Q$ , $\tilde{\Delta}$ , $\tilde{g}$ and $\tilde{B}$ by (2.48) and (2.51).

Step 6. Set $g_{k}=\tilde{g}$ , $B_{k}=\tilde{B}$ and $\Delta_{k}=\tilde{\Delta}$ . Then solve the subproblem (2.49)-(2.50) by Algorithm 1.1 to get $u_{\ast}$ .

Step 7. Set $y_{\ast}=Qu_{\ast}$ and $s_{k}=\tau_{\ast}a+y_{\ast}$ , and stop.

In order to discuss the lower bound of predicted reduction in each iteration, we define the following predicted reduction.

[TABLE]

Now we should prove the following theorem to guarantee the global convergence of the algorithm proposed in the next section.

Theorem 2.6.

Under the same conditions as Lemma 2.1. If $s_{k}=\pm\tau_{\Delta}a$ are obtained by Steps 5 in Algorithm 2.1, then there exists a positive constant $c_{1}$ such that

[TABLE]

Proof.

(1) If $s_{k}=\tau_{\Delta}a$ , then we know that $\tau_{\ast}=\tau_{\Delta}$ . By computation, we have

[TABLE]

where $\tau_{\Delta}$ is generated in two cases as defined in (2.21) and (2.23). In both cases, we can find $\tau_{\Delta}\leq\tau_{cp}$ and

[TABLE]

Then

[TABLE]

(1a) For $1-\Delta\|a\|\geq\varepsilon_{0}$ , then from (2.21) we know that $a_{\tau}>0$ and $a^{T}g<0$ . Combining with (2.55) and (2.56) , we have

[TABLE]

where

[TABLE]

(1b) For $1-\Delta\|a\|\leq-\varepsilon_{0}$ , then from (2.23) we know that $a_{\tau}<0$ and $a^{T}g>0$ . Because of $1-\Delta\|a\|<0$ and $a^{T}g>0$ , then from (2.56) we also have (2.57) holds.

(2) If $s_{k}=-\tau_{\Delta}a$ , then

[TABLE]

where $-\tau_{\Delta}$ is generated in the following three cases as defined in (2.21)-(2.23) and (2.36).

(2a) For $1-\Delta\|a\|\geq\varepsilon_{0}$ , then $1\leq 1+\Delta\|a\|\leq 2-\varepsilon_{0}$ .

From (2.21), we know that if $a_{\tau}\leq 0$ then $a^{T}g>0$ . Thus,

[TABLE]

And then, from (2) we know

[TABLE]

On the other hand, if $a_{\tau}>0,a^{T}g>0$ then $-\tau_{\Delta}\geq\tau_{cp}$ . Then from (2.4) and (2.17) we have (2.60) holds too. It follows that (2) holds.

(2b) For $|1-\Delta\|a\|\,|<\varepsilon_{0}$ , then $2-\varepsilon_{0}<1+\Delta\|a\|<2+\varepsilon_{0}$ .

Combining with (2.22), we can prove that (2.60) holds by the same way and

[TABLE]

(2c) For $1-\Delta\|a\|\,\leq-\varepsilon_{0}$ , then $1+\Delta\|a\|\geq 2+\varepsilon_{0}$ .

From (2.23), we know that if $a_{\tau}<0$ , then

[TABLE]

By the definition of $\mbox{pred}_{1}(\tau)$ in the (2.52), we get

[TABLE]

Combining with the proof of the above case (1a) in this theorem, we have

[TABLE]

Therefore, the theorem follows from (2.57) and (2)-(2) with

[TABLE]

∎

Theorem 2.7.

Under the same conditions as Lemma 2.1. If $s_{k}$ is obtained from the above Algorithm 2.1, then there exists a positive constant $c_{4}$ such that

[TABLE]

Proof.

(1) If $s_{k}$ is obtained by Algorithm 1.1, then from Nocedal and Wright [36] we have

[TABLE]

where $c_{2}\in(0,1]$ .

(2) If $s_{k}=\pm\tau_{\Delta}a$ , then (2.54) holds.

(3) $s_{k}=\tau_{\ast}a+Qu_{\ast}$ , where $\tau_{\ast}\neq\pm\tau_{\Delta}$ . Combining with (2.52) and (2.53), we have

[TABLE]

Because of $u_{\ast}$ is obtained by Algorithm 1.1, then from [36] we have

[TABLE]

where $c_{3}\in(0,1]$ , $\tilde{\Delta}$ , $\tilde{g}$ and $\tilde{B}$ as defined by (2.48) and (2.51). Thus,

[TABLE]

where $\tau_{\ast}$ can be $\tau_{cp},\tau_{d}$ or $\tau_{u}$ .

(3a) If $\tau_{\ast}=\tau_{cp}$ , then from (2.68) we have

[TABLE]

where the second equality is from (2.17) and the last equality is from (2.58).

(3b) If $\tau_{\ast}=\tau_{d}$ , then

[TABLE]

From (2.21)-(2.23) and (2.36), we know that $\tau_{d}\leq\tau_{cp}$ and $a^{T}g<0$ . For $\tau_{d}\leq\tau_{cp}$ , then we have

[TABLE]

and

[TABLE]

where $0<\varepsilon_{0}<1$ .

(3c) If $\tau_{\ast}=\tau_{u}$ , then

[TABLE]

From (2.21)-(2.23) and (2.36), we know that $\tau_{cp}\leq\tau_{u}$ , $a_{\tau}<0$ and $a^{T}g>0$ . For $\tau_{cp}\leq\tau_{u}$ , then we have

[TABLE]

and

[TABLE]

Therefore, the theorem follows from (2.54), (2.66) and (2)-(2) with

[TABLE]

∎

3 The algorithm and its convergence

In this section, we propose a quasi-Newton method with a conic model for unconstrained minimization and prove its convergence under some reasonable conditions. In order to solve the problem (1.1), we approximate $f(x)$ with a conic model of the form

[TABLE]

where $f_{k}=f(x_{k}),\ g_{k}=\nabla f(x_{k})$ , $B_{k}\in R^{n\times n}$ and $a_{k}\in R^{n}$ are parameter vectors.

The choice of the parameters $a_{k}$ and $B_{k}$ in (3.1) can refer to [16, 17, 18, 26, 27] and [37, 38] respectively. We set

[TABLE]

If $\beta>0$ , then

[TABLE]

otherwise, $\beta_{k}=1$ . In the updating process, we compute

[TABLE]

where

[TABLE]

and $y_{k}=g_{k+1}-g_{k}$ .

Let $s_{k}$ be the solution of the subproblem (1.7)-(1.8) by Algorithm 2.1. Then either $x_{k}+s_{k}$ is accepted as a new iteration point or the trust region radius is reduced according to a comparison between the actual reduction of the objective function

[TABLE]

and the reduction predicted by the conic model

[TABLE]

That is, if the reduction in the objective function is satisfactory, then we finish the current iteration by taking

[TABLE]

and adjusting the trust-region radius; otherwise the iteration is repeated at point $x_{k}$ with a reduced trust-region radius.

Now we give the alternating direction trust-region algorithm based on conic model (3.1).

**Algorithm 3.1 ** (ADCTR).

Step 0. Choose parameters $\epsilon,\varepsilon,\varepsilon_{0}\in(0,1)$ , $0<\eta_{1}<\eta_{2}<1$ , $0<\delta_{1}<1<\delta_{2}$ and $\bar{\Delta}>0$ ; give a starting point $x_{0}\in R^{n}$ , $B_{0}\in R^{n\times n}$ , $a_{0}\in R^{n}$ and an initial trust region radius $\Delta_{0}\in(0,\bar{\Delta}]$ ; set $k=0$ .

Step 1. Compute $f_{k}$ and $g_{k}$ . If $\|g_{k}\|<\varepsilon$ , then stop with $x_{k}$ as the approximate optimal solution; otherwise go to Step 2.

Step 2. Set $a=a_{k}$ , $g=g_{k}$ , $B=B_{k}$ and $\Delta=\Delta_{k}$ . Then solve the subproblem (1.7)-(1.8) by Algorithm 2.1 to get one of the approximate solution $s_{k}$ .

Step 3. Compute $\mbox{ared}(s_{k})$ , $\mbox{pred}(s_{k})$ and

[TABLE]

If $r_{k}\leq\eta_{1}$ , then set $\Delta_{k}=\delta_{1}\Delta_{k}$ , and go to Step 2. If $r_{k}>\eta_{1}$ , then set $x_{k+1}=x_{k}+s_{k}$ and

[TABLE]

Step 4. Generate $a_{k+1}$ and $B_{k+1}$ ; set $k=k+1$ , and go to Step 1.

In this algorithm, the procedure of ”Step 2-Step 3-Step 2” is named as inner cycle. The following theorem guarantees that the ADCTR algorithm does not cycle infinitely in the inner cycle.

**Assumption 3.1. ** The level set

[TABLE]

and the sequence $\{\|a_{k}\|\}$ , $\{\|g_{k}\|\}$ and $\{\|B_{k}\|\}$ are all uniformly bounded, ${B_{k}}$ is symmetric and positive definite and $f$ is twice continuously differentiable in $L(x_{0})$ .

From (3.10) and Theorem 2.2, we have

[TABLE]

where $c_{1}$ as defined by (2.72).

Theorem 3.1.

Suppose that Assumption 3.1 holds. $s_{k}$ is the solution of conic trust-region subproblem (1.7)-(1.8). If the process does not terminate at $x_{k}$ , then we must have $r_{k}>\eta_{1}$ after a finite number of inner iterations.

Proof.

We assume that the algorithm does not terminate at $x_{k}$ , then there is $\varepsilon_{1}>0$ such that

[TABLE]

From Assumption 3.1 we have

[TABLE]

For simplicity, we suppose that the superscript denotes the iterative step of inner iteration at $x_{k}$ , then

[TABLE]

Assume $s^{j}_{k}$ is a solution of subproblem (1.7)-(1.8) with trust-region radius $\Delta^{j}_{k}$ , then it is easy to know that

[TABLE]

From (3.14), (3.15) and (3.17), we can obtain that there exist an integer $j_{1}$ and a constant $\eta_{3}>0$ such that

[TABLE]

It follows from (3.16) that

[TABLE]

On the other hand, from (3.17) and (3.15) we can get

[TABLE]

And then, from (3.15)-(3.20) we have

[TABLE]

where $\vartheta_{k}\in(0,1)$ and $Q=M_{1}+\bar{B}+2\bar{a}\bar{g}$ . Combining with (3.18) and (3.22), we can get that

[TABLE]

holds for all $j\geq j_{1}$ . By (3.17) and (3.23),

[TABLE]

holds for all sufficiently large $j$ , which contradicts (3.19). This completes the proof. ∎

In the following we give the global convergence property of Algorithm 3.1.

Theorem 3.2.

Suppose that Assumption 3.1 holds. Then for any $\varepsilon>0$ , the Algorithm 3.1 terminates in finite number of iterations, that is

[TABLE]

Proof.

We give the proof by contradiction. Suppose that there is $\varepsilon_{2}>0$ such that

[TABLE]

Combining with (3.13), (3.15) and (3.25), we have

[TABLE]

where the first inequality of (3.26) follows from

[TABLE]

and the second inequality is from $\Delta_{k}\leq\bar{\Delta}$ and

[TABLE]

From Steps 3 of Algorithm 3.1 and (3.26), we obtain that for all $k$

[TABLE]

Since $f(x)$ is bounded from below and $f_{k+1}<f_{k}$ , we have

[TABLE]

Combining with Theorem 3.1, we know that

[TABLE]

which implies that

[TABLE]

On the other hand, similar to the proof of (3.20)-(3.24) we can obtain

[TABLE]

where $K$ is sufficiently large. From Step 3 of Algorithm 3.1, it follows that

[TABLE]

which is a contradiction to (3.30). The theorem is proved. ∎

4 Numerical Tests

In this section, algorithm ADCTR is tested with some standard test problems from [26, 40]. The purpose of this paper is to propose a new method to solve the conic trust region subproblem, that is alternating direction method, so we performed algorithm ADCTR on a limited number of test problems. The names of the 16 test problems are listed in Table 1.

All the computations are carried out in Matlab R2015b on a microcomputer in double precision arithmetic. These tests use the same stopping criterion $\|g_{k}\|\leq 10^{-5}$ . The columns in the Tables have the following meanings: No. denotes the numbers of the test problems; $n$ is the dimension of the test problems; Iter is the number of iterations; $nf$ is the number of function evaluations performed; $ng$ is the number of gradient evaluations; $f_{k}$ is the final objective function value; $\|g\|$ is the Euclidean norm of the final gradient; CPU(s) denotes the total iteration time of the algorithm in seconds. The sign * means that when the number of iterations reaches 5000, the algorithm fails to stop. The parameters in these algorithms are

[TABLE]

The numerical results of algorithm ADCTR for 16 unconstrained optimization problems are listed in Table 2. We note that the optimal value of these test problems is $f_{*}=0$ . From Table 2, we can see that our algorithm can obtain the minimum value of the function after a finite number of iterations. And the corresponding minimum point is the stability point, which is also the optimal solution. Therefore, the performance of ADCTR is feasible and effective.

In order to analyze the effectiveness of our new algorithm, we compare ADCTR with the conic quasi-Newton trust region algorithm in which the subproblems are solved by the dogleg method (DCTR), see Zhu [26] and Lu [27]. As the dimensions of each test problem ranging from 2 to 4000, we have actually computed 48 numerical comparisons experiments and the numerical results are listed in Table 3. Analyzing the numerical results, we have the following conclusions: for the 16 problems, our algorithm ADCTR is better than the DCTR for 12 tests, is somewhat bad for 2 tests, and the two algorithms are same in efficiency for the other 2 tests; our algorithm in which the subproblems are solved by alternating direction method is competitive with algorithm DCTR in [26]. Especially for large-scale problems, our new algorithm has a strong numerical stability.

5 Conclusions

In this paper, we propose an alternating direction trust region method based on the conic model for unconstrained optimization and investigate its convergence. Conic models are more flexible to approximate objective functions and have stronger modeling property. Alternating direction method (ADM) has been well studied in the context of linearly constrained convex programming problems. It is because of the significant efficiency and easy implementation of ADM that we consider applying it to solving the trust region subproblem based on the conic model. Initial numerical results show that our new method is competitive and it is also effective and robust for large-scale problems. The numerical results and the theoretical results lead us to believe that the method is worthy of further study.

In addition, the main purpose of this paper is to explore a new method for solving the conic model subproblem. Therefore, there are many aspects worthy of further improvement and research in this paper. For example, we can consider the weak convergence assumptions that the Hessian approximations $B_{k}$ is symmetric and positive semidefinite. The rate of convergence has not been studied.

Acknowledgements

We are grateful to the editors and referees for their suggestions and comments. This work was supported by National Natural Science Foundation of China (11771210) and the Natural Science Foundation of Jiangsu Province (BK20141409).

References

[1]

S. P. Han. A globally convergent method for nonlinear programming, Journal of Optimization Theory and Applications, 1977, 22(3):297–309.

[2]

M. J. D. Powell. Variable Metric Methods for Constrained Optimization, Springer Berlin Heidelberg, 1983.

[3]

Y. X. Yuan, W. Y. Sun, Conic Methods for Unconstrained Minimization and Tensor Methods for Nonlinear Equations, Science Press, Beijing, China, 1997.

[4]

M. J. D. Powell, Y. X. Yuan, A trust region algorithm for equality constrained optimization, Mathematical Programming, 1990, 49(1):189–211.

[5]

A. Vardi. A trust region algorithm for equality constrained minimization: Convergence properties and implementation, Siam Journal on Numerical Analysis, 1981, 22(3):575–591.

[6]

P. T. Boggs, R. H. Byrd, R. B. Schnabel. A stable and efficient algorithm for nonlinear orthogonal distance regression, SIAM Journal on Scientific and Statistical Computing, 1987, 8(6):1052–1078.

[7]

P. L. TOINT. Global convergence of a class of trust region methods for nonconvex minimization in hilbert space, IMA Journal of Numerical Analysis, 1988, 8(2):231–252.

[8]

J. Z. Zhang, D. T. Zhu. Projected quasi-newton algorithm with trust region for constrained optimization, Journal of Optimization Theory and Applications, 1990, 67(2):369–393.

[9]

M. El-Alem. A robust trust-region algorithm with a nonmonotonic penalty parameter scheme for constrained optimization, Siam Journal on Optimization, 1995, 5(2):348–378.

[10]

A. R. Conn, N. I. M. Gould, P. L. Toint. Trust-region methods, Society for Industrial and Applied Mathematics, 2000.

[11]

M. J. D. Powell, A hybrid method for nonlinear equations,In :Ph. D. Rabonowitz,Gordon and Breach, eds., Numerical Methods for Nonlinear Algebraic Equations, 1970, 87–114.

[12]

J. E.,Dennis, H. H. W. Mei. Two new unconstrained optimization algorithms which use function and gradient values, Journal of Optimization Theory and Applications, 1979, 28(4):453–482.

[13]

L. Zhang, Z. Q. Tang. The Hybrid Dogleg Method to Solve Subproblems of Trust Region. Journal of Nanjing Normal University, 2001, 24(1):28–32.

[14]

J. Z. Zhang, X. J. Xu, D. T. Zhu. A nonmonotonic dogleg method for unconstrained optimization, SIAM Journal on Scientific and Statistical Computing, 1987, 8(6):1052–1078.

[15]

Y. L. Zhao, C. X. X, R. B. Schnabel. A new trust region dogleg method for unconstrained optimization, Appl. Math. J. Chinese Univ. Ser. B, 2000, 15(1):83-92.

[16]

W. C. Davidon, Conic approximations and collinear scalings for optimizers, Siam Journal on Numerical Analysis, 1980, 17(2):268–281.

[17] Schnabel R. Conic methods for unconstrained minimization and tensor methods for nonlinear equations. Math Prog: The State of the Art, (eds. A. Bachem, M. Grötschel and B. Korte), Heidelberg: Springer-Verlag, 1982: 417–438.
[18] Sorensen D C. Newton’s method with a model trust region modification. SIAM J Numer Analy, 1982, 19(2): 409–426.
[19] Xu C X, Yang X Y. Convergence of conic quasi-Newton trust region methods for unconstrained minimization. Math Appl, 1998, 11(2): 71–76.
[20] Y X Yuan. A review of trust region algorithms for optimization. ICIAM, 2000, 99(1): 271–282.
[21] D M Gay. Computing optimal locally constrained steps. SIAM J Sci Stat Comput, 1981, 2(2): 186–197.
[22] J M Peng, Y X Yuan. Optimality conditions for the minimization of a quadratic with two quadratic constraints. SIAM J Optim, 1997, 7(3): 579–594.
[23]

W. Y. Sun, Y. X. Yuan. A conic trust-region method for nonlinearly constrained optimization, Annals of Operations Research, 2001, 103(1):175–191.

[24] S. Di and W. Y. Sun,

A trust region method for conic model to solve unconstraind optimizaions,

Optimization Methods and Software, 1996, 6(4):237–263.

[25] Q. Ni,

Optimality conditions for trust-region subproblems involving a conic model,

SIAM Journal on Optimization, 2005, 15(3):826–837.

[26]

M. Zhu, Y. Xue, Z. F. Sheng. A quasi-newton type trust region method based on the conic model, Numerical Mathematics A Journal of Chinese Universities, 1995, 17(1):36–47.

[27] Lu X P, Ni Q. A quasi-newton trust region method with a new conic model for the unconstrained optimization. Appl Math Comput, 2008, 204(1): 373–384.
[28]

L. J. Zhao, W. Y. Sun. A conic affine scaling method for nonlinear optimization with bound constraints, 2013, 30(3):1-30.

[29]

D. Gabay, B. Mercier. A dual algorithm for the solution of nonlinear variational problems via flnite-element approximations, Computer and Mathematics with Applications,1976, 2(1):17–40.

[30]

G. Chen, M. Teboulle. A proximal-based decomposition method for convex minimization problems, Mathematical Programming, 1994, 64(1-3):81–101.

[31]

J. Eckstein, M. Fukushima. Some reformulation and applications of the alternating direction method of multipliers, Large Scale Optimization: State of the Art, W. W. Hager etal eds., Kluwer Academic Publishers, 1994, 115–134.

[32]

B. S. He, L. Z. Liao, D. Han, H. Yang. A new inexact alternating directions method for monontone variational inequalities, Mathematical Programming, 2002, 92(1):103–118.

[33]

S. Kontogiorgis, R. R. Meyer. A variable-penalty alternating directions method for convex optimization, Mathematical Programming, 1998, 83(1):29–53.

[34]

K. Zhang, J. S. Li, Y. C. Song, X. S. Wang. An alternating direction method of multipliers for elliptic equation constrained optimization problem, SCIENCE CHINA Mathematics, 2017, 60(2):361–378.

[35]

M. H. Xu. Proximal Alternating Directions Method for Structured Variational Inequalities, Journal of Optimization Theory and Applications, 2007, 134(1):107–117.

[36]

Jorge Nocedal, Stephen J. Wright. Numerical optimization. Science Press,Beijing, China, 2006.

[37] Powell M J D. Algorithms for nonlinear constraints that use Lagrange functions. Math Prog, 1978, 14(1): 224–248
[38] M. Al-Baali. Damped techniques for enforcing convergence of quasi-Newton methods. Optim Meth Softw, 2014, 29(5): 919–936
[39]

H. Zhu, Q. Ni, M. L. Zeng. A quasi-newton trust region method based on a new fractional model, Numerical Algebra, Control and Optimization, 2015, 5(3):237–249.

[40] More J J, Garbow B S, Hillstrom K E. Testing unconstrained optimization software. ACM Trans. Math. Software, 1981, 7(1): 17–41.

Bibliography40

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. P. Han. A globally convergent method for nonlinear programming, Journal of Optimization Theory and Applications, 1977, 22(3):297–309.
2[2] M. J. D. Powell. Variable Metric Methods for Constrained Optimization, Springer Berlin Heidelberg, 1983.
3[3] Y. X. Yuan, W. Y. Sun, Conic Methods for Unconstrained Minimization and Tensor Methods for Nonlinear Equations, Science Press, Beijing, China, 1997.
4[4] M. J. D. Powell, Y. X. Yuan, A trust region algorithm for equality constrained optimization, Mathematical Programming, 1990, 49(1):189–211.
5[5] A. Vardi. A trust region algorithm for equality constrained minimization: Convergence properties and implementation, Siam Journal on Numerical Analysis, 1981, 22(3):575–591.
6[6] P. T. Boggs, R. H. Byrd, R. B. Schnabel. A stable and efficient algorithm for nonlinear orthogonal distance regression, SIAM Journal on Scientific and Statistical Computing, 1987, 8(6):1052–1078.
7[7] P. L. TOINT. Global convergence of a class of trust region methods for nonconvex minimization in hilbert space, IMA Journal of Numerical Analysis, 1988, 8(2):231–252.
8[8] J. Z. Zhang, D. T. Zhu. Projected quasi-newton algorithm with trust region for constrained optimization, Journal of Optimization Theory and Applications, 1990, 67(2):369–393.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A new alternating direction trust region method based on conic model for solving unconstrained optimization

Abstract

keywords:

1 Introduction

2 A simple alternating direction search method

Lemma 2.1**.**

Proof.

Theorem 2.1**.**

Proof.

Theorem 2.2**.**

Proof.

Theorem 2.3**.**

Proof.

Theorem 2.4**.**

Proof.

Theorem 2.5**.**

Proof.

Theorem 2.6**.**

Proof.

Theorem 2.7**.**

Proof.

3 The algorithm and its convergence

Theorem 3.1**.**

Proof.

Theorem 3.2**.**

Proof.

4 Numerical Tests

5 Conclusions

Acknowledgements

References

Lemma 2.1.

Theorem 2.1.

Theorem 2.2.

Theorem 2.3.

Theorem 2.4.

Theorem 2.5.

Theorem 2.6.

Theorem 2.7.

Theorem 3.1.

Theorem 3.2.