Simplex QP-based methods for minimizing a conic quadratic objective over   polyhedra

Alper Atamturk; Andres Gomez

arXiv:1706.05795·math.OC·November 6, 2018·Math. Program. Comput.

Simplex QP-based methods for minimizing a conic quadratic objective over polyhedra

Alper Atamturk, Andres Gomez

PDF

TL;DR

This paper introduces simplex QP-based methods for efficiently solving conic quadratic optimization problems over polyhedra, outperforming interior point methods especially in branch-and-bound contexts for discrete problems.

Contribution

It proposes a reformulation using the perspective of the quadratic function, enabling simple coordinate descent and bisection algorithms that are suitable for warm starts and branch-and-bound algorithms.

Findings

01

Up to 22x faster for large convex instances

02

Higher precision solutions compared to interior point methods

03

Significant speed-ups over barrier-based and LP-based branch-and-bound algorithms

Abstract

We consider minimizing a conic quadratic objective over a polyhedron. Such problems arise in parametric value-at-risk minimization, portfolio optimization, and robust optimization with ellipsoidal objective uncertainty; and they can be solved by polynomial interior point algorithms for conic quadratic optimization. However, interior point algorithms are not well-suited for branch-and-bound algorithms for the discrete counterparts of these problems due to the lack of effective warm starts necessary for the efficient solution of convex relaxations repeatedly at the nodes of the search tree. In order to overcome this shortcoming, we reformulate the problem using the perspective of the quadratic function. The perspective reformulation lends itself to simple coordinate descent and bisection algorithms utilizing the simplex method for quadratic programming, which makes the solution methods…

Tables6

Table 1. Table 1 . The effect of optimality tolerance.

Tolerance	BAR			ALG1
Tolerance	optgap	time	#iter	optgap	time	#iter	#QP
$10^{- 1}$	$8.65 \times 10^{- 2}$	29.9	10	$5.48 \times 10^{- 5}$	3.2	835	4
$10^{- 2}$	$8.77 \times 10^{- 3}$	41.5	15	$3.24 \times 10^{- 7}$	4.2	844	6
$10^{- 3}$	$6.98 \times 10^{- 4}$	54.6	23	$2.60 \times 10^{- 9}$	4.3	844	8
$10^{- 4}$	$5.52 \times 10^{- 5}$	62.9	27	$2.12 \times 10^{- 11}$	4.7	844	10
$10^{- 5}$	$3.72 \times 10^{- 6}$	66.8	29	$6.80 \times {𝟏𝟎}^{- 𝟏𝟑}$	5.2	844	12
$10^{- 6}$	$7.12 \times 10^{- 7}$	69.6	30	$5.32 \times 10^{- 13}$	5.4	844	13
$10^{- 7}$	$2.04 \times {𝟏𝟎}^{- 𝟖}$	72.0	32	$5.15 \times 10^{- 13}$	6.0	844	15
$10^{- 8}$	$2.65 \times 10^{- 9}$	74.0	33	$5.15 \times 10^{- 13}$	6.2	844	17
$10^{- 9}$	$2.42 \times 10^{- 10}$	75.9	34	$5.15 \times 10^{- 13}$	6.6	844	19
$10^{- 10}$	$1.97 \times 10^{- 11}$	78.7	35	$5.15 \times 10^{- 13}$	7.0	844	21
$10^{- 11}$	$9.61 \times 10^{- 12}$	79.6	36	$5.15 \times 10^{- 13}$	7.4	844	23
$10^{- 12}$	$0$	89.6	39	$5.15 \times 10^{- 13}$	7.8	844	25

Table 2. Table 2 . The effect of nonlinearity (cardinality instances).

		Method	$Ω = 1$			$Ω = 2$			$Ω = 3$			$Ω = 4$
$r$	$α$	Method	time	#iter	#QP	time	#iter	#QP	time	#iter	#QP	time	#iter	#QP
100	0.1	ALG1	1.0	22	20	1.1	53	24	1.3	104	29	1.4	123	26
		ALG2	0.8	41	14	0.9	95	15	0.9	150	15	1.1	219	16
		BAR	4.6	16	-	4.9	24	-	5.2	26	-	5.1	25	-
		MOS	2.1	10	-	2.4	11	-	2.2	10	-	2.4	11	-
100	0.5	ALG1	1.1	33	23	1.1	69	24	1.5	144	37	1.6	192	30
		ALG2	0.8	60	14	0.9	125	15	0.9	200	15	1.1	251	16
		BAR	4.5	21	-	5.1	25	-	5.8	29	-	5.7	29	-
		MOS	2.2	10	-	2.2	10	-	2.4	10	-	2.4	11	-
200	0.1	ALG1	0.9	33	19	1.1	73	25	1.2	110	25	1.4	157	26
		ALG2	0.8	49	14	0.9	126	14	0.9	172	14	1.1	259	15
		BAR	4.7	22	-	4.5	22	-	5.1	25	-	5.3	27	-
		MOS	2.4	11	-	2.6	12	-	2.5	11	-	2.4	11	-
200	0.5	ALG1	1.0	48	22	1.1	99	22	1.2	151	25	1.6	218	24
		ALG2	0.9	94	14	0.9	179	14	1.0	233	15	1.3	326	15
		BAR	4.4	21	-	4.9	24	-	5.2	26	-	5.8	31	-
		MOS	2.3	10	-	2.4	10	-	2.4	11	-	2.6	11	-
avg		ALG1	1.0	34	21	1.1	73	24	1.3	127	29	1.5	173	27
		ALG2	0.8	61	14	0.9	131	15	0.9	189	15	1.2	264	15
		BAR	4.3	20	-	4.9	24	-	5.3	27	-	5.5	28	-
		MOS	2.3	10	-	2.4	11	-	2.4	11	-	2.4	11	-

Table 3. Table 3 . The effect of nonlinearity (path instances).

		Method	$Ω = 1$			$Ω = 2$			$Ω = 3$			$Ω = 4$
$r$	$α$	Method	time	#iter	#QP	time	#iter	#QP	time	#iter	#QP	time	#iter	#QP
100	0.1	ALG1	4.9	940	12	7.0	1,307	16	8.5	1,505	18	10.5	1,756	21
		ALG2	5.4	1,283	11	7.0	1,637	13	8.4	1,865	14	9.7	2,375	13
		BAR	81.1	26	-	64.3	21	-	55.3	16	-	56.8	16	-
		MOS	19.4	19	-	16.8	17	-	15.6	18	-	15.8	19	-
100	0.5	ALG1	5.2	902	14	8.2	1,191	21	9.3	1,391	21	11.0	1,641	21
		ALG2	5.5	1,148	12	7.2	1,474	13	8.6	1,772	14	9.6	2,020	14
		BAR	62.7	19	-	56.0	16	-	57.1	16	-	57.7	16	-
		MOS	17.4	17	-	17.8	18	-	15.7	19	-	14.9	17	-
200	0.1	ALG1	4.9	836	14	6.3	1,053	15	8.3	1,220	18	14.4	1,429	17
		ALG2	4.9	932	12	6.8	1,377	13	8.4	1,671	13	12.4	1,833	13
		BAR	76.8	25	-	60.1	18	-	66.0	20	-	128.3	21	-
		MOS	16.2	17	-	16.1	18	-	15.5	18	-	15.2	18	-
200	0.5	ALG1	4.5	858	12	6.2	1,048	15	7.6	1,237	16	12.7	1,387	18
		ALG2	4.9	978	12	6.8	1,363	13	8.5	1,626	13	15.9	1,794	14
		BAR	83.1	26	-	72.7	21	-	64.6	18	-	101.2	16	-
		MOS	18.1	19	-	16.8	20	-	16.9	20	-	16.2	19	-
avg		ALG1	4.9	884	13	6.9	1,150	17	8.4	1,338	18	12.1	1,553	20
		ALG2	5.2	1,086	12	6.9	1,463	13	8.5	1,734	13	11.9	2,005	14
		BAR	75.9	24	-	63.2	19	-	60.7	17	-	86.0	17	-
		MOS	17.8	18	-	16.8	18	-	15.9	19	-	15.5	18	-

Table 4. Table 4 . The effect of dimension (cardinality instances).

Method	$n = 400$			$n = 800$			$n = 1600$			$n = 3200$
Method	time	#iter	#QP	time	#iter	#QP	time	#iter	#QP	time	#iter	#QP
ALG1	0.2	43	20	0.6	65	19	2.8	75	25	11.7	104	25
ALG2	0.2	73	14	0.5	116	14	2.2	129	15	9.1	175	15
BAR	0.3	21	-	2.4	22	-	22.1	27	-	204.9	30	-
MOS	0.2	9	-	1.2	11	-	7.3	12	-	50.4	12	-

Table 5. Table 5 . Comparison for discrete cardinality instances.

		Method	$Ω = 1$				$Ω = 2$				$Ω = 3$				$Ω = 4$
$r$	$α$	Method	time	nodes	egap	#s	time	nodes	egap	#s	time	nodes	egap	#s	time	nodes	egap	#s
100	0.1	BBA1	1	156	0.0	5	29	3,271	0.0	5	685	68,318	0.0	5	3,644	272,527	0.1	4
		BBA2	3	156	0.0	5	83	3,271	0.0	5	1,823	68,317	0.0	5	6,218	172,489	0.3	2
		BBBR	16	156	0.0	5	349	3,270	0.0	5	4,664	43,695	0.1	3	7,200	23,324	1.2	0
		CXBR	35	276	0.0	5	513	3,497	0.0	5	5,260	32,782	0.2	2	7,200	17,169	1.3	0
		CXLP	34	9,562	0.0	5	7,200	209,576	0.7	0	7,200	244,911	2.2	0	7,200	265,485	3.8	0
		CXLPE	2	374	0.0	5	91	7,640	0.0	5	2,788	111,293	0.0	5	6,983	191,065	0.6	1
		CXD	4	368	0.0	5	42	5,152	0.0	5	640	58,076	0.0	5	3,778	183,816	0.1	4
100	0.5	BBA1	1	87	0.0	5	59	6,274	0.0	5	1,469	140,874	0.0	5	6,328	447,989	0.4	2
		BBA2	2	87	0.0	5	160	6,274	0.0	5	4,024	139,154	0.0	4	7,200	223,921	0.7	0
		BBBR	10	87	0.0	5	686	6,274	0.0	5	6,134	56,394	0.3	1	7,200	21,111	1.7	0
		CXBR	24	183	0.0	5	1,027	6,734	0.0	5	6,399	39,710	0.4	1	7,200	20,101	2.0	0
		CXLP	294	26,957	0.0	5	7,200	229,641	0.8	0	7,200	263,810	2.3	0	7,200	244,863	4.7	0
		CXLPE	2	349	0.0	5	218	14,737	0.0	5	5,116	215,292	0.1	2	7,200	170,710	1.1	0
		CXD	3	373	0.0	5	164	16,070	0.0	5	3,643	245,251	0.0	4	7,042	336,814	0.8	1
200	0.1	BBA1	1	247	0.0	5	23	3,259	0.0	5	637	55,248	0.0	5	3,761	344,662	0.2	4
		BBA2	5	247	0.0	5	79	3,259	0.0	5	2,083	55,248	0.0	5	6,975	242,548	0.4	2
		BBBR	24	247	0.0	5	321	3,259	0.0	5	4,573	39,647	0.1	3	7,200	17,017	1.4	0
		CXBR	52	460	0.0	5	540	3,711	0.0	5	5,295	34,090	0.2	2	7,200	13,490	1.5	0
		CXLP	221	17,205	0.0	5	7,200	208,874	0.6	0	7,200	230,304	2.0	0	7,200	186,490	4.2	0
		CXLPE	4	473	0.0	5	139	6,064	0.0	5	4,073	111,205	0.1	3	7,200	158,866	0.9	0
		CXD	5	360	0.0	5	60	6,413	0.0	5	1,410	67,577	0.0	5	7,044	349,653	0.5	1
200	0.5	BBA1	4	674	0.0	5	194	24,636	0.0	5	1,674	156,632	0.0	5	5,778	526,215	0.3	2
		BBA2	15	674	0.0	5	633	24,635	0.0	5	3,446	98,028	0.1	4	7,200	259,040	0.6	0
		BBBR	77	674	0.0	5	2,106	17,743	0.0	4	5,590	47,725	0.2	2	7,200	20,422	1.5	0
		CXBR	104	680	0.0	5	2,452	15,816	0.0	4	6,127	38,973	0.3	1	7,200	14,955	1.7	0
		CXLP	3,514	120,007	0.1	4	7,200	212,082	1.0	0	7,200	240,445	2.3	0	7,200	195,841	4.8	0
		CXLPE	21	1,461	0.0	5	1,739	61,593	0.0	4	5,435	163,105	0.2	2	7200	197,287	1.1	0
		CXD	18	1,612	0.0	5	1,211	75,098	0.0	5	5,017	245,412	0.2	2	7,200	319,645	1.0	0
avg		BBA1	2	291	0.0	20	76	9,360	0.0	20	1,116	105,268	0.0	20	4,878	397,848	0.3	12
		BBA2	6	291	0.0	20	239	9,360	0.0	20	2,844	90,187	0.0	18	6,898	224,500	0.5	4
		BBBR	32	291	0.0	20	865	7,637	0.0	19	5,240	46,865	0.2	9	7,200	20,469	1.4	0
		CXBR	54	400	0.0	20	1,133	7,440	0.0	19	5,770	36,389	0.3	6	7,200	16,429	1.6	0
		CXLP	1,016	43,433	0.0	19	7,200	215,043	0.8	0	7,200	244,867	2.2	0	7,200	223,170	4.4	0
		CXLPE	7	664	0.0	20	547	20,800	0.0	19	4,353	139,632	0.1	12	7,146	179,482	0.9	1
		CXD	7	678	0.0	20	369	25,683	0.0	20	2,677	151,588	0.1	16	6,267	297,482	0.6	6

Table 6. Table 6 . Comparison for discrete path instances.

		Method	$Ω = 1$				$Ω = 2$				$Ω = 3$				$Ω = 4$
$r$	$α$	Method	time	nodes	egap	#s	time	nodes	egap	#s	time	nodes	egap	#s	time	nodes	egap	#s
100	0.1	BBA1	287	145	0.0	5	4,511	1,774	0.0	5	7,200	2,720	5.9	0	7,200	2,360	13.9	0
		BBA2	619	142	0.0	5	6,871	1,105	1.1	1	7,200	1,278	7.9	0	7,200	974	17.2	0
		BBBR	3,577	91	0.2	4	7,200	184	4.9	0	7,200	236	11.7	0	7,200	61	37.2	0
		CXBR	7,200	67	20.8	0	7,200	79	$\infty$	0	7,200	129	$\infty$	0	7,200	13	$\infty$	0
		CXLP	533	2,428	0.0	5	7,200	24,776	4.5	0	7,200	20,099	15.0	0	7,200	8,971	30.2	0
		CXLPE	802	315	0.0	5	6,726	1,967	2.9	1	7,200	2,585	23.5	0	7,200	2,377	45.1	0
		CXD	1,466	164	0.0	5	4,655	1,176	0.0	5	7,200	2,428	7.4	0	7,200	2,047	16.6	0
100	0.5	BBA1	625	353	0.0	5	5,424	1,904	0.6	2	6,512	2,725	4.8	1	7,200	2,833	12.5	0
		BBA2	1,457	362	0.0	5	6,286	971	1.6	1	7,200	1,369	7.3	0	7,200	1,296	16.2	0
		BBBR	6,071	134	0.7	2	7,200	175	4.9	0	7,200	162	11.5	0	7,200	28	$\infty$	0
		CXBR	7,200	24	$\infty$	0	7,200	56	$\infty$	0	7,200	70	$\infty$	0	7,200	13	$\infty$	0
		CXLP	1,132	6,187	0.0	5	7,200	23,671	4.4	0	7,200	16,851	12.8	0	7,200	8,180	23.8	0
		CXLPE	967	607	0.0	5	6,420	2,077	3.4	1	7,200	3,070	16.2	0	7,200	3,036	42.3	0
		CXD	1,645	267	0.0	5	6,421	1,931	0.5	3	7,200	2,659	5.6	0	7,200	4,166	12.8	0
200	0.1	BBA1	155	77	0.0	5	2,392	1,075	0.0	5	6,434	3,380	2.6	1	7,200	3,245	10.0	0
		BBA2	306	79	0.0	5	4,324	823	0.3	3	7,152	1,469	4.6	1	7,200	1,195	12.7	0
		BBBR	3,245	76	0.0	5	7,200	171	2.4	0	7,200	180	10.5	0	7,200	33	$\infty$	0
		CXBR	7,200	34	$\infty$	0	7,200	45	$\infty$	0	7,200	68	$\infty$	0	7,200	11	$\infty$	0
		CXLP	436	1,548	0.0	5	7,200	30,265	2.9	0	7,200	20,579	12.4	0	7,200	7,274	26.7	0
		CXLPE	524	188	0.0	5	5,156	1,437	1.4	3	7,200	2,420	17.7	0	7,200	2,157	46.7	0
		CXD	2,059	106	0.0	5	5,715	1,251	0.5	4	7,200	2,568	4.1	0	7,200	1,996	12.9	0
200	0.5	BBA1	321	196	0.0	5	3,953	2,286	0.3	4	7,200	3,194	4.3	0	7,200	2,486	13.2	0
		BBA2	808	201	0.0	5	5,517	1,204	0.9	3	7,200	1,284	6.5	0	7,200	1,009	16.6	0
		BBBR	4,826	113	0.2	3	7,200	173	3.9	0	7,200	176	12.7	0	7,200	54	43.6	0
		CXBR	7,200	20	$\infty$	0	7,200	51	$\infty$	0	7,200	89	$\infty$	0	7,200	10	$\infty$	0
		CXLP	859	4,989	0.0	5	7,200	28,007	4.4	0	7,200	18,873	13.3	0	7,200	10,313	28.9	0
		CXLPE	1,046	399	0.0	5	5,948	2,212	1.8	2	7,200	2,305	21.2	0	7,200	2,048	51.2	0
		CXD	2,281	177	0.0	5	4,975	1,873	0.4	4	7,200	2,233	6.9	0	7,200	1,325	16.0	0
avg		BBA1	347	193	0.0	20	4,070	1,760	0.2	16	6,837	3,005	4.4	2	7,200	2,731	12.4	0
		BBA2	798	196	0.0	20	5,750	1,026	1.0	8	7,189	1,350	6.6	1	7,200	1,119	15.7	0
		BBBR	4,430	103	0.3	14	7,200	176	4.0	0	7,200	189	11.6	0	7,200	44	$\infty$	0
		CXBR	7,200	36	$\infty$	0	7,200	58	$\infty$	0	7,200	89	$\infty$	0	7,200	12	$\infty$	0
		CXLP	740	3,788	0.0	20	7,200	26,680	4.1	0	7,200	19,101	13.4	0	7,200	8,684	27.4	0
		CXLPE	835	377	0.0	20	6,063	1,923	2.4	7	7,200	2,595	19.7	0	7,200	2,404	46.3	0
		CXD	1,863	178	0.0	20	5,441	1,558	0.3	16	7,200	2,472	6.0	0	7,200	2,383	14.6	0

Equations42

(CO) x \in R^{n} min {c^{'} x + Ω x^{'} Q x : x \in X},

(CO) x \in R^{n} min {c^{'} x + Ω x^{'} Q x : x \in X},

h (x, t) = ⎩ ⎨ ⎧ \frac{x ^{'} Q x}{t} 0 + \infty if t > 0, if x^{'} Q x = 0, t = 0, otherwise.

h (x, t) = ⎩ ⎨ ⎧ \frac{x ^{'} Q x}{t} 0 + \infty if t > 0, if x^{'} Q x = 0, t = 0, otherwise.

min {c^{'} x + Ω x^{'} Q x : x \in X}

min {c^{'} x + Ω x^{'} Q x : x \in X}

=

\geq

(PO) ζ = min {c^{'} x + \frac{Ω}{2} h (x, t) + \frac{Ω}{2} t : x \in X, t \geq 0} .

(PO) ζ = min {c^{'} x + \frac{Ω}{2} h (x, t) + \frac{Ω}{2} t : x \in X, t \geq 0} .

A x

A x

- c^{'} - \frac{Ω}{t} x^{'} Q

\frac{Ω}{2 t ^{2}} x^{'} Q x - \frac{Ω}{2}

μ

μ^{'} x

A x

A x

- c^{'} - \frac{Ω}{x ^{'} Q x} x^{'} Q

t

μ

μ^{'} x

x \in R^{n} min {c^{'} x : A x = b, x \geq 0, x^{'} Q x = 0},

x \in R^{n} min {c^{'} x : A x = b, x \geq 0, x^{'} Q x = 0},

g (t) = x \in X min c^{'} x + \frac{Ω}{2} h (x, t) + \frac{Ω}{2} t \cdot

g (t) = x \in X min c^{'} x + \frac{Ω}{2} h (x, t) + \frac{Ω}{2} t \cdot

- c^{'} - \frac{Ω}{t _{i}} x_{i + 1}^{'} Q = λ^{'} A - μ .

- c^{'} - \frac{Ω}{t _{i}} x_{i + 1}^{'} Q = λ^{'} A - μ .

- c^{'} - \frac{Ω}{t _{i}} x_{i + 1}^{'} Q - λ^{'} A + μ \leq ϵ .

- c^{'} - \frac{Ω}{t _{i}} x_{i + 1}^{'} Q - λ^{'} A + μ \leq ϵ .

- c^{'} - Ω \frac{x _{i + 1}^{'} Q}{x _{i + 1} ^{'} Q x _{i + 1}} - λ^{'} A + μ \leq ϵ + \frac{∣ Δ _{i} ∣}{t _{i}} \cdot ∥ \nabla f (x_{i + 1}) ∥

- c^{'} - Ω \frac{x _{i + 1}^{'} Q}{x _{i + 1} ^{'} Q x _{i + 1}} - λ^{'} A + μ \leq ϵ + \frac{∣ Δ _{i} ∣}{t _{i}} \cdot ∥ \nabla f (x_{i + 1}) ∥

- c^{'} - Ω \frac{x _{i + 1} ^{'} Q}{x _{i + 1} ^{'} Q x _{i + 1}} - λ^{'} A + μ

- c^{'} - Ω \frac{x _{i + 1} ^{'} Q}{x _{i + 1} ^{'} Q x _{i + 1}} - λ^{'} A + μ

=

=

=

\leq

min {c^{'} x + \frac{Ω}{2 t} x^{'} Q x + \frac{Ω}{2} t : x \in \overset{ˉ}{X}, t \geq 0},

min {c^{'} x + \frac{Ω}{2 t} x^{'} Q x + \frac{Ω}{2} t : x \in \overset{ˉ}{X}, t \geq 0},

x \in R^{n} min (x - y)^{'} Q (x - y) + β ∥ x ∥_{1},

x \in R^{n} min (x - y)^{'} Q (x - y) + β ∥ x ∥_{1},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Simplex QP-based methods for minimizing

a conic quadratic objective over polyhedra

Alper Atamtürk and Andrés Gómez

Abstract.

We consider minimizing a conic quadratic objective over a polyhedron. Such problems arise in parametric value-at-risk minimization, portfolio optimization, and robust optimization with ellipsoidal objective uncertainty; and they can be solved by polynomial interior point algorithms for conic quadratic optimization. However, interior point algorithms are not well-suited for branch-and-bound algorithms for the discrete counterparts of these problems due to the lack of effective warm starts necessary for the efficient solution of convex relaxations repeatedly at the nodes of the search tree.

In order to overcome this shortcoming, we reformulate the problem using the perspective of the quadratic function. The perspective reformulation lends itself to simple coordinate descent and bisection algorithms utilizing the simplex method for quadratic programming, which makes the solution methods amenable to warm starts and suitable for branch-and-bound algorithms. We test the simplex-based quadratic programming algorithms to solve convex as well as discrete instances and compare them with the state-of-the-art approaches. The computational experiments indicate that the proposed algorithms scale much better than interior point algorithms and return higher precision solutions. In our experiments, for large convex instances, they provide up to 22x speed-up. For smaller discrete instances, the speed-up is about 13x over a barrier-based branch-and-bound algorithm and 6x over the LP-based branch-and-bound algorithm with extended formulations.

Keywords: Simplex method, conic quadratic optimization, quadratic programming, warm starts, value-at-risk minimization, portfolio optimization, robust optimization.

A. Atamtürk: Industrial Engineering & Operations Research, University of California, Berkeley, CA 94720-1777. [email protected]

A. Gómez: Department of Industrial Engineering, Swanson School of Engineering, University of Pittsburgh, PA 15261-3077. [email protected]

May 2017; May 2018

1. Introduction

Consider the minimization of a conic quadratic function over a polyhedron, i.e.,

[TABLE]

where $c\in\mathbb{R}^{n},\ Q\in\mathbb{R}^{n\times n}$ is a symmetric positive semidefinite matrix, $\Omega>0$ , and $X\subseteq\mathbb{R}^{n}$ is a rational polyhedron. We denote by CDO the discrete counterpart of CO with integrality restrictions: $X\cap\mathbb{Z}^{n}$ . CO and CDO are frequently used to model utility with uncertain objectives as in parametric value-at-risk minimization [25], portfolio optimization [5], and robust counterparts of linear programs with an ellipsoidal objective uncertainty set [13, 14, 16].

Note that CO includes linear programming (LP) and convex quadratic programming (QP) as special cases. The simplex method [22, 41, 39] is still the most widely used algorithm for LP and QP, despite the fact that polynomial interior point algorithms [28, 34, 32] are competitive with the simplex method in many large-scale instances. Even though non-polynomial, the simplex method has some distinct advantages over interior point methods. Since the simplex method iterates over bases, it is possible to carry out the computations with high accuracy and little cost, while interior point methods come with a trade-off between precision and efficiency. Moreover, an optimal basis returned by the simplex method is useful for sensitivity analysis, while interior point methods do not produce such a basis unless an additional “crashing” procedure is performed [e.g. 31]. Finally, if the parameters of the problem change, re-optimization can often be done very fast with the simplex method starting from a primal or dual feasible basis, whereas warm starts with interior point methods have limitations [42, 21]. In particular, fast re-optimization with the dual simplex method is crucial when solving discrete optimization problems with a branch-and-bound algorithm.

CO is a special case of conic quadratic optimization [30, 3], which can be solved by polynomial-time interior points algorithms [2, 35, 15]. Although CO can be solved by a general conic quadratic solver, we show in this paper that iterative QP algorithms scale much better. In particular, simplex-based QP algorithms allowing warm starts perform much faster than interior point methods for CO.

For the discrete counterpart CDO, a number of different approaches are available for the special case with a diagonal $Q$ matrix: Ishii et al. [27] give a polynomial time for optimization over spanning trees; Bertsimas and Sim [17] propose an approximation algorithm that solves series of linear integer programs; Atamtürk and Narayanan [7] give a cutting plane algorithm utilizing the submodularity of the objective for the binary case; Atamtürk and Goméz [4] give nonlinear cuts for the mixed 0-1 case; Atamtürk and Narayanan [8] give a parametric $O(n^{3})$ algorithm for the binary case with a cardinality constraint. Maximization of the same objective over the binaries is $\mathcal{N}P$ -hard [1].

The aforementioned approaches do not extend to the non-diagonal case or to general feasible regions, which are obviously $\mathcal{N}P$ -hard as quadratic and linear integer optimization are special cases. The branch-and-bound algorithm is the method of choice for general CDO. However, branch-and-bound algorithms that repeatedly employ a nonlinear programming (NLP) solver at the nodes of the search tree are typically hampered by the lack of effective warm starts. Borchers and Mitchell [20] and Leyffer [29] describe NLP-based branch-and-bound algorithms, and they give methods that branch without solving the NLPs to optimality, reducing the computational burden for the node relaxations. On the other hand, LP-based branch-and-bound approaches employ linear outer approximations of the nonlinear terms. This generally results in weaker relaxations at the nodes, compared to the NLP approaches, but allows one to utilize warm starts with the simplex method. Therefore, one is faced with a trade-off between the strength of the node relaxations and the solve time per node. A key idea to strengthen the node relaxations, as noted by Tawarmalani and Sahinidis [37], is to use extended formulations. Atamtürk and Narayanan [6] describe mixed-integer rounding inequalities in an extended formulation for conic quadratic integer programming. Vielma et al. [40] use an extended formulation for conic quadratic optimization that can be refined during branch-and-bound, and show that an LP-based branch-and-bound using the extended formulations typically outperforms the NLP-based branch-and-bound algorithms. The reader is referred to Belotti et al. [12] for an excellent survey of the solution methods for mixed-integer nonlinear optimization.

In this paper, we reformulate CO through the perspective of the quadratic term and give algorithms that solve a sequence of closely related QPs. Utilizing the simplex method, the solution to each QP is used to warm start the next one in the sequence, resulting in a small number of simplex iterations and fast solution times. Moreover, we show how to incorporate the proposed approach in a branch-and-bound algorithm, efficiently solving the continuous relaxations to optimality at each node and employing warm starts with the dual simplex method. Our computational experiments indicate that the proposed approach outperforms the state-of-the-art algorithms for convex as well as discrete cases.

The rest of the paper is organized as follows. In Section 2 we give an alternative formulation for CO using the perspective function of the quadratic function. In Section 3 we present coordinate descent and accelerated bisection algorithms that solve a sequence of QPs. In Section 4 we provide computational experiments, comparing the proposed methods with state-of-the-art barrier and other algorithms.

2. Formulation

In this section we present a reformulation of CO using the perspective function of the quadratic term. Let $X=\left\{x\in\mathbb{R}^{n}:Ax=b,\ x\geq 0\right\}$ be the feasible region of problem CO. For convex quadratic $q(x)=x^{\prime}Qx$ , consider the function $h:\mathbb{R}^{n+1}\to\mathbb{R}_{+}\cup\{\infty\}$ defined as

[TABLE]

Observe that

[TABLE]

where

[TABLE]

We will show that problems CO and PO have, in fact, the same optimal objective value and that there is a one-to-one correspondence between the optimal primal-dual pairs of both problems.

Proposition 1.

Problem PO is a convex optimization problem.

Proof.

It suffices to observe that $h$ is the closure of the perspective function $tq(x/t)$ of the convex quadratic function $q(x)$ , and is therefore convex [e.g. 26, p. 160]. Since all other objective terms and constraints of PO are linear, PO is a convex optimization problem. ∎

Proposition 2.

Problems CO and PO are equivalent.

Proof.

If $t>0$ , the objective function of problem PO is continuous and differentiable, and since the feasible region is a polyhedron and the problem is convex, its KKT points are equivalent to its optimal solutions. The KKT conditions of PO are

[TABLE]

where $\lambda$ and $\mu$ are the dual variables associated with constraints $Ax=b$ and $x\geq 0$ , respectively. Note that $t>0$ and (2) imply that $t=\sqrt{x^{\prime}Qx}$ . Substituting $t=\sqrt{x^{\prime}Qx}$ in (1), one arrives at the equivalent conditions

[TABLE]

Ignoring the redundant variable $t$ and equation (4), we see that these are the KKT conditions of problem CO. Therefore, any optimal primal-dual pair for PO with $t>0$ is an optimal primal-dual pair for CO. Similarly, we see that any optimal primal-dual pair of problem CO with $x^{\prime}Qx>0$ gives an optimal primal-dual pair of problem PO by setting $t=\sqrt{x^{\prime}Qx}$ . In both cases, the objective values match.

On the other hand, if $t=0$ , then PO reduces to problem

[TABLE]

which corresponds to CO with $x^{\prime}Qx=0$ , and hence they are equivalent. ∎

Proposition 2 indeed holds for more general problems; it is not necessary to have a polyhedral feasible set [9]. Since they are equivalent optimization problems, we can use PO to solve CO. In particular, we exploit the fact that, for a fixed value of $t$ , PO reduces to a QP.

3. Algorithms

For simplicity, assume that PO has an optimal solution; hence, $X$ is nonempty and may be assumed to be bounded. Consider the one-dimensional optimal value function

[TABLE]

As $X$ is nonempty and bounded, $g$ is real-valued and, by Proposition 1, it is convex. Throughout, $x(t)$ denotes an optimal solution to (5).

In this section we describe two algorithms for PO that utilize a QP oracle. The first one is a coordinate descent approach, whereas, the second one is an accelerated bisection search algorithm on the function $g$ . Finally, we discuss how to exploit the warm starts with the simplex method to solve convex as well as discrete cases.

3.1. Coordinate descent algorithm

Algorithm 1 successively optimizes over $x$ for a fixed value of $t$ , and then optimizes over $t$ for a fixed value of $x$ . Observe that the optimization problem in line 6 over $x$ is a QP, and the optimization in line 7 over $t$ has a closed form solution: by simply setting the derivative to zero, we find that $t_{i+1}=\sqrt{{x_{i+1}}^{\prime}Qx_{i+1}}$ .

First observe that the sequence of objective values $\left\{c^{\prime}x_{i}+\frac{\Omega}{2t_{i}}x_{i}^{\prime}Qx_{i}+\frac{\Omega}{2}t_{i}\right\}_{i\in\mathbb{N}}$ is non-increasing. Moreover, the dual feasibility KKT conditions for the QPs in line 6 are of the form

[TABLE]

Let $\|\cdot\|$ be a norm and suppose that the QP oracle finds feasible primal-dual pairs with $\epsilon>0$ tolerance with respect to $\|\cdot\|$ . In particular $x_{i+1}$ in line 6 violates (6) by at most $\epsilon$ , i.e.,

[TABLE]

Proposition 3 below states that, at each iteration of Algorithm 1, we can bound the violation of the dual feasibility condition (3) corresponding to the original problem CO. The bound depends only on the precision of the QP oracle $\epsilon$ , the relative change of $t$ in the last iteration $\frac{\Delta_{i}}{t_{i}}$ , where $\Delta_{i}=t_{i+1}-t_{i}$ , and the gradient of the function $f(x)=\Omega\sqrt{x^{\prime}Qx}$ evaluated at the new point $x_{i+1}$ .

Proposition 3 (Dual feasibility bound).

A pair $(x_{i+1},t_{i+1})$ in Algorithm 1 satisfies

[TABLE]

Proof.

[TABLE]

∎

Let $t^{*}$ be a minimizer of $g$ on $\mathbb{R}_{+}$ . We now show that the sequence of values of $t$ produced by Algorithm 1, $\left\{t_{i}\right\}_{i\in\mathbb{N}}$ , is monotone and bounded by $t^{*}$ .

Proposition 4 (Monotonicity).

If $t_{i}\leq t^{*}$ , then $t_{i+1}=\sqrt{{x_{i+1}}^{\prime}Qx_{i+1}}$ satisfies $t_{i}\leq t_{i+1}\leq t^{*}$ . Similarly, if $t_{i}\geq t^{*}$ , then $t_{i}\geq t_{i+1}\geq t^{*}$ .

Proof.

If $t_{i}\leq t^{*}$ , then $\frac{\Omega}{2t_{i}}\geq\frac{\Omega}{2t^{*}}$ . It follows that $x(t_{i+1})$ is a minimizer of an optimization problem with a larger coefficient for the quadratic term than $x(t^{*})$ , and therefore ${{x_{i+1}}^{\prime}Qx_{i+1}}=t_{i+1}^{2}\leq{t^{*}}^{2}={x^{*}}^{\prime}Qx^{*}$ , and $t_{i+1}\leq t^{*}$ . Moreover, the inequality $t_{i}\leq t_{i+1}$ follows from the convexity of the one-dimensional function $g$ and the fact that function $g$ is minimized at $t^{*}$ , and that $g(t_{i+1})\leq g(t_{i})$ . The case $t_{i}\geq t^{*}$ is similar. ∎

Since the sequence $\left\{t_{i}\right\}_{i\in\mathbb{N}}$ is bounded and monotone, it converges to a supremum or infimum. Thus $\left\{t_{i}\right\}_{i\in\mathbb{N}}$ is a Cauchy sequence, and $\lim\limits_{i\to\infty}\Delta_{i}=0$ . Corollaries 1 and 2 below state that Algorithm 1 converges to an optimal solution. The cases where there exists a KKT point for PO (i.e., there exists an optimal solution with $t^{*}>0$ ) and where there are no KKT points are handled separately.

Corollary 1 (Convergence to a KKT point).

If PO has a KKT point, then Algorithm 1 converges to a KKT point.

Proof.

By convexity, the set of optimal solutions to (5) is an interval, $[t_{\ell},t_{u}]$ . Since by assumption there exists a KKT point, we have that $t_{u}>0$ . The proof is by cases, depending on the value of $t_{0}$ in line 3 of Algorithm 1.

**Case $t_{\ell}\leq t_{0}\leq t_{u}$ : **

Since $t_{0}$ is optimal, we have by Proposition 4 that $t_{1}=t_{0}$ . Since $\Delta_{0}=0$ and $t_{0}=\sqrt{x_{i+1}^{\prime}Qx_{i+1}}>0$ , we have that $\left\|\nabla f(x_{i+1})\right\|<\infty$ in Proposition 3, and $\frac{\left|\Delta_{i}\right|}{t_{i}}\cdot\left\|\nabla f(x_{i+1})\right\|=0$ .

**Case $t_{0}<t_{\ell}$ : **

We have by Proposition 4 than for all $i\in\mathbb{N}$ , $t_{i}=\sqrt{x_{i}^{\prime}Qx_{i}}\geq t_{0}>0$ . Therefore, there exists a number $M$ such that $\frac{1}{t_{i}}\left\|\nabla f(x_{i+1})\right\|<M$ for all $i\in\mathbb{N}$ , and we find that $\frac{\left|\Delta_{i}\right|}{t_{i}}\cdot\left\|\nabla f(x_{i+1})\right\|\xrightarrow{\Delta_{i}\to 0}0$ .

**Case $t_{0}>t_{u}$ : **

We have by Proposition 4 than for all $i\in\mathbb{N}$ , $t_{i}=\sqrt{x_{i}^{\prime}Qx_{i}}\geq t_{u}>0$ . Therefore, there exists a number $M$ such that $\frac{1}{t_{i}}\left\|\nabla f(x_{i+1})\right\|<M$ for all $i\in\mathbb{N}$ , and we find that $\frac{\left|\Delta_{i}\right|}{t_{i}}\cdot\left\|\nabla f(x_{i+1})\right\|\xrightarrow{\Delta_{i}\to 0}0$ .

Therefore, in all cases, Algorithm 1 convergences to a KKT point by Proposition 3. ∎

Corollary 2 (Convergence to [math]).

If $t^{*}=0$ is the unique optimal solution to $\min\{g(t):t\in\mathbb{R}_{+}\}$ , then for any $\xi>0$ Algorithm 1 finds a solution $(\bar{x},\bar{t})$ , where $\bar{t}<\xi$ and $\bar{x}\in\operatorname*{arg\,min}\left\{c^{\prime}x:\sqrt{x^{\prime}Qx}=\bar{t},x\in X\right\}$ .

Proof.

The sequence $\left\{t_{i}\right\}_{i\in\mathbb{N}}$ converges to [math] (otherwise, by Corollary 1, it would converge to a KKT point). Thus, $\lim_{i\to\infty}\sqrt{x_{i}^{\prime}Qx_{i}}=0$ and all points obtained in line 6 of Algorithm 1 satisfy $x_{i+1}\in\operatorname*{arg\,min}\left\{c^{\prime}x:\sqrt{x^{\prime}Qx}=t_{i+1},x\in X\right\}$ . ∎

We now discuss how to initialize and terminate Algorithm 1, corresponding to lines 3 and 9, respectively.

Initialization.

The algorithm may be initialized by an arbitrary $t_{0}>0$ . Nevertheless, when a good initial guess on the value of $t^{*}$ is available, $t_{0}$ should be set to that value. Moreover, observe that setting $t_{0}=\infty$ results in a fast computation of $x_{1}$ by solving an LP.

Stopping condition.

Proposition 3 suggests a good stopping condition for Algorithm 1. Given a desired dual feasibility tolerance of $\delta>\epsilon$ , we can stop when $\epsilon+\frac{\left|\Delta_{i}\right|}{t_{i}}\cdot\left\|\nabla f(x_{i+1})\right\|<\delta$ . Alternatively, if $\exists k\text{ s.t. }\max_{x\in X}\left\|\nabla f(x)\right\|\leq k<\infty$ , then the simpler $\left|\frac{\Delta_{i}}{t_{i}}\right|\leq\frac{\delta-\epsilon}{k}$ is another stopping condition. For instance, a crude upper bound on ${\|}\nabla f(x){\|}=\Omega\left\|\frac{{x}^{\prime}Q}{\sqrt{{x}^{\prime}Qx}}\right\|$ can be found by maximizing/minimizing the numerator $x^{\prime}Q$ over $X$ and minimizing $x^{\prime}Qx$ over $X$ . The latter minimization is guaranteed to have a nonzero optimal value if $0\not\in X$ and $Q$ is positive definite.

*Remark 1**.*

Observe that the stopping condition above may fail if $t^{*}=0$ is the unique optimal solution of $\min_{t\geq 0}g(t)$ (Corollary 2). This case happens only if $Q$ is positive semi-definite (but not positive definite), or if $x^{*}=0$ is the unique optimal solution of CO. Such situations rarely arise in practice and can often be ruled out a priori. Nonetheless, stopping Algorithm 1 also when $t_{i}\leq\xi$ for some small $\xi>0$ ensures that the algorithm terminates (as specified in Corollary 2) even when $t^{*}=0$ .

3.2. Bisection algorithm

Algorithm 2 is an accelerated bisection approach to solve PO. The algorithm maintains lower and upper bounds, $t_{\min}$ and $t_{\max}$ , on $t^{*}$ and, at each iteration, reduces the interval $[t_{\min},t_{\max}]$ by at least half. The algorithm differs from the traditional bisection search algorithm in lines 9–13, where it uses an acceleration step to reduce the interval by a larger amount: by Proposition 4, if $t_{0}\leq t_{1}$ (line 9), then $t_{0}\leq t_{1}\leq t^{*}$ , and therefore $t_{1}$ is a higher lower bound on $t^{*}$ (line 10); similarly, if $t_{0}\geq t_{1}$ , then $t_{1}$ is an lower upper bound on $t^{*}$ (lines 11 and 12). Intuitively, the algorithm takes a “coordinate descent” step as in Algorithm 1 after each bisection step. Preliminary computations show that the acceleration step reduces the number of steps as well as the overall solution time for the bisection algorithm by about 50%.

Initialization.

In line 3, $t_{\min}$ can be initialized to zero and $t_{\max}$ to ${x_{LP}}^{\prime}Qx_{LP}$ , where $x_{LP}$ is an optimal solution to the LP relaxation $\min_{x\in X}c^{\prime}x$ .

Stopping condition.

There are different possibilities for the stopping criterion in line 18. Note that if we have numbers $t_{m}$ and $t_{M}$ such that $t_{m}\leq t^{*}\leq t_{M}$ , then $c^{\prime}x(t_{M})+\Omega\sqrt{{x(t_{m})}^{\prime}Qx(t_{m})}$ is a lower bound on the optimal objective value $c^{\prime}x^{*}+\Omega\sqrt{{x^{*}}^{\prime}Qx^{*}}$ . Therefore, in line 7, a lower bound $z_{l}$ on the objective function can be computed, and the algorithm can be stopped when the gap between $\hat{z}$ and $z_{l}$ is smaller than a given threshold. Alternatively, stopping when $\frac{\left|t_{1}-t_{0}\right|}{t_{0}}\cdot\Omega\left\|\frac{{x_{0}}^{\prime}Q}{\sqrt{{x_{0}}^{\prime}Qx_{0}}}\right\|<\delta-\epsilon$ provides a guarantee on the dual infeasibility as in Proposition 3.

3.3. Warm starts

Although any QP solver can be used to run the coordinate descent and bisection algorithms described in Sections 3.1 and 3.2, simplex methods for QP are particularly effective as they allow warm starts for small changes in the model parameters in iterative applications. This is the main motivation for the QP based algorithms presented above.

3.3.1. Warm starts with primal simplex for convex optimization

All QPs solved in Algorithms 1–2 have the same feasible region and only the objective function changes in each iteration. Therefore, an optimal basis for a QP is primal feasible for the next QP solved in the sequence, and can be used to warm start a primal simplex QP solver.

3.3.2. Warm starts with dual simplex for discrete optimization

When solving discrete counterparts of CO with a branch-and-bound algorithm one is particularly interested in utilizing warm starts in solving convex relaxations at the nodes of the search tree. In a branch-and-bound algorithm, children nodes typically have a single additional bound constraint compared to the parent node.

For this purpose, it is also possible to warm start Algorithm 1 from a dual feasible basis. Let $(x^{*},t^{*})$ be an optimal solution to PO and $B^{*}$ be an optimal basis. Consider a new problem

[TABLE]

where the feasible set $\bar{X}$ is obtained from $X$ by adding new constraints. Note that $B^{*}$ is a dual feasible basis for (7) when $t=t^{*}$ . Therefore, Algorithm 1 to solve problem (7) can be warm started by initializing $t_{0}=t^{*}$ and using $B^{*}$ as the initial basis to compute $x_{1}$ with a dual simplex algorithm. The subsequent QPs can be solved using the primal simplex algorithm as noted in Section 3.3.1.

3.4. Special cases

The simplex method is a general algorithm that can be used with any polyhedron $X$ and, as mentioned in Section 3.3, is well suited for solving the sequence of QPs. Nevertheless, for particular feasible regions, other specialized algorithms may be preferable. For instance, in the trivial unbounded case ( $X=\mathbb{R}^{n}$ ), the QPs can be solved in closed form. Another, more interesting, case is the problem

[TABLE]

where $y\in\mathbb{R}^{n}$ is fixed, $\|\cdot\|_{1}$ denotes the $\ell_{1}$ -norm and $\beta>0$ is a regularization parameter. Note that (8) is a special case of CO with the usual linearization of the $\ell_{1}$ -norm. Problem (8) arises in compressed sensing [10] and sparse linear regression [11], and is solved fast using first-order methods [33]. Note if Algorithm 1 or 2 is used instead, then every QP subproblem $\min_{x\in\mathbb{R}^{n}}\frac{(x-y)^{\prime}Q(x-y)}{2t}+\beta\|x\|_{1}$ corresponds to the well-studied Lasso problem [38], for which efficient special algorithms exist. In particular, problem (8) can be solved with a single call to an algorithm that computes the regularization path (i.e., solves the problem for all $\beta$ ), such as Least Angle Regression [24].

4. Computational experiments

In this section we report on computational experiments with solving convex CO and its discrete counterpart CDO with the algorithms described in Section 3. The algorithms are implemented with CPLEX Java API. We use the simplex and barrier solvers of CPLEX version 12.6.2, as well as the barrier solver of MOSEK version 8.1.0 for the computational experiments. All experiments are conducted on a workstation with a 2.93GHz Intel®CoreTM i7 CPU and 8 GB main memory using a single thread.

4.1. Test problems

We test the algorithms on two types of data sets. For the first set the feasible region is described by a cardinality constraint and bounds, i.e., $X=\left\{x\in\mathbb{R}^{n}:\sum_{i=1}^{n}x_{i}=b,\;\textbf{0}\leq x\leq\textbf{1}\right\}$ with $b=n/5$ ; problems with a cardinality constraint are common in finance [19] and statistics [18]. For the second data set the feasible region consists of the path polytope of an acyclic grid network; conic quadratic optimization over paths has been studied in [17, 36], and similar substructures arise in more complex problems such as vehicle routing [23]. For discrete optimization problems we additionally enforce the binary restrictions $x\in\mathbb{B}^{n}$ .

For both data sets the objective function $q(x)=c^{\prime}x+\Omega\sqrt{x^{\prime}Qx}$ is generated as follows: Given a rank parameter $r$ and density parameter $\alpha$ , $Q$ is the sum of a low rank factor matrix and a full rank diagonal matrix; that is, $Q=F\Sigma F^{\prime}+D$ , where

•

$D$ is an $n\times n$ diagonal matrix with entries drawn from Uniform $(0,1)$ .

•

$\Sigma=HH^{\prime}$ where $H$ is an $r\times r$ matrix with entries drawn from Uniform $(-1,1)$ .

•

$F$ is an $n\times r$ matrix in which each entry is [math] with probability $1-\alpha$ and drawn from Uniform $(-1,1)$ with probability $\alpha$ .

Note that the construction of matrix $Q$ is consistent with factor models often used in finance. In particular, $F$ is the factor exposure matrix, $\Sigma$ is the factor covariance matrix and $D$ is the matrix of the residual variances. Each linear coefficient $c_{i}$ is drawn from Uniform $(-2\sqrt{Q_{ii}},0)$ .

4.2. Experiments with convex problems

In this section we present the computational results for convex instances. We compare the following algorithms:

**ALG1: **

Algorithm 1.

**ALG2: **

Algorithm 2.

**BAR: **

CPLEX barrier algorithm (the default solver in CPLEX for convex conic quadratic problems).

**MOS: **

MOSEK barrier algorithm.

For algorithms ALG1 and ALG2 we use CPLEX primal simplex algorithm as the QP solver.

Optimality tolerance

As the speed of the interior point methods crucially depends on the chosen optimality tolerance, it is prudent to first compare the speed vs the quality of the solutions for the algorithms tested. Here we study the impact of the optimality tolerance in the solution time and the quality of the solutions for CPLEX barrier algorithm BAR and simplex QP-based algorithm ALG1. The optimality tolerance of the barrier algorithm is controlled by the QCP convergence tolerance parameter (“BarQCPEpComp”), and in Algorithm 1, by the stopping condition $\frac{\left|\Delta_{i}\right|}{t}\leq\delta$ .

In both cases, a smaller optimality tolerance corresponds to a higher quality solution. We evaluate the quality of a solution as $\texttt{optgap}=\left|(z_{\min}-z)/z_{\min}\right|,$ where $z$ is the objective value of the solution found by an algorithm with a given tolerance parameter and $z_{\min}$ is the objective value of the solution found by the barrier algorithm with tolerance $10^{-12}$ (minimum tolerance value allowed by CPLEX). Table 1 presents the results for different tolerance values for a $30\times 30$ convex grid instance with $r=200$ , $\alpha=0.1$ , and $\Omega=1$ . The table shows, for varying tolerance values and for each algorithm, the quality of the solution, the solution time in seconds, the number of iterations, and QPs solved (for ALG1). We highlight in bold the default tolerance used for the rest of the experiments presented in the paper. The tolerance value $10^{-7}$ for the barrier algorithm corresponds to the default parameter in CPLEX.

First observe that the solution time increases with reduced optimality tolerance for both algorithms. With lower tolerance, while the barrier algorithm performs more iterations, ALG1 solves more QPs; however, the total number of simplex iterations barely increases. For ALG1 the changes in the value of $t$ are very small between QPs, and the optimal bases of the QPs are thus the same. Therefore, using warm starts, the simplex method is able to find high precision solutions inexpensively. ALG1 achieves much higher precision an order of magnitude faster than CPLEX barrier algorithm. For the default tolerance parameters used in our computational experiments, Algorithm 1 is several orders of magnitude more precise than the barrier algorithm.

Effect of the nonlinearity parameter $\Omega$ .

We now study the effect of changing the nonlinearity parameter $\Omega$ . Tables 2 and 3 show the total solution time in seconds, the total number of simplex or barrier iterations, and the number of QPs solved in cardinality (1000 variables) and path instances (1760 variables), respectively. Each row represents the average over five instances for a rank ( $r$ ) and density( $\alpha$ ) configuration and algorithm used. For each parameter choice the fastest algorithm is highlighted in bold. Figure 1 also shows the total number of instances solved within the given time limit for each instance class.

Observe that, compared to CPLEX barrier algorithm, the simplex QP-based methods are 3.5 and 6 times faster for the cardinality instances and up to 15 times faster for the path instances. Additionally, the simplex QP-based methods are two to three times faster than MOSEK barrier algorithm for the cardinality instances, and up to four times faster for the path instances. Figure 1 shows that the simplex QP-based methods solve most of the instances well within the time required for MOSEK barrier algorithm to solve the easiest instance.

The barrier algorithms do not appear to be too sensitive to the nonlinearity parameter $\Omega$ , whereas the simplex QP-based methods are faster for smaller $\Omega$ . The number of simplex iterations in ALG1 increases with the nonlinearity parameter $\Omega$ . Indeed, the initial problem solved by ALG1 is an LP (corresponding to $\Omega=0$ ), so as $\Omega$ increases the initial problem becomes a worse approximation, and more work is needed to converge to an optimal solution. Also note that Algorithm 2 requires fewer QPs to be solved, but as a result it benefits less from warm starts (it requires more simplex iterations per QP than ALG1). Indeed, in ALG2 the value of $t$ changes by a larger amount at each iteration (with respect to ALG1), so the objective function of two consecutive QPs changes by a larger amount. Finally, note that although their runtime is very close, the performance of ALG2 is slightly better than ALG1 overall.

Effect of the dimension

Table 4 presents a comparison of the algorithms for the convex cardinality instances with sizes 400, 800, 1600, and 3200. Each row represents the average over five instances, as before, generated with parameters $r=200$ , $\alpha=0.1$ , and $\Omega=2$ . Additionally, Figure 2 shows the solution time for ALG1, ALG2 and MOS as a function of the dimension ( $n$ ).

Observe in Table 4 that the number of QPs solved with the simplex-based algorithms does not depend on the dimension. The number of simplex iterations, however, increases with the dimension. For $n=400$ all algorithms perform similarly and the problems are solved very fast. However, as the dimension increases, the simplex-based algorithms outperform the barrier algorithms, often by many factors. For $n=3200$ , the fastest simplex-based algorithm ALG2 is more than 20 times faster than CPLEX barrier algorithm, and more than five times faster than MOSEK barrier algorithm. Similar results are obtained for other parameter choices and for the path instances as well. In summary, the simplex-based algorithms scale better with the dimension, and are faster by orders of magnitude for large instances. As in Section 4.2, ALG2 slightly outperforms ALG1 for the instances considered.

4.3. Discrete instances

In this section we describe our experiments with the discrete counterpart CDO. To the best of our knowledge, as of version 12.6.2 of CPLEX, there is no documented way to embed a user-defined convex solver such as Algorithm 1 or 2 at the nodes of the CPLEX branch-and-bound algorithm. Therefore, in order to test the proposed approach for CDO, we implement a rudimentary branch-and-bound algorithm described in Appendix A. The algorithm uses a maximum infeasibility rule for branching, and does not employ presolve, cutting planes, or heuristics. We test the following configurations:

**BBA1: **

Branch-and-bound algorithm in Appendix A using Algorithm 1 as the convex solver. The first QP at each node (except the root node) is solved with CPLEX dual simplex method using the parent dual feasible basis as a warm start (as mentioned in Section 3.3) and all other QPs are solved with CPLEX primal simplex method using the basis from the parent node QP as a warm start.

**BBA2: **

Branch-and-bound algorithm in Appendix A using Algorithm 2 as the convex solver. Algorithm 2 resulted in the best performance in the continuous instances; however, unlike Algorithm 1, it cannot be naturally warm-started. Thus, in this configuration, each convex subproblem is solved without exploiting the solution from the parent node.

**BBBR: **

Branch-and-bound algorithm in Appendix A, using CPLEX barrier algorithm as the convex solver. This configuration does not use warm starts.

**CXBR: **

CPLEX branch-and-bound algorithm with barrier solver, setting the branching rule to maximum infeasibility, the node selection rule to best bound, and disabling presolve, cuts and heuristics. In this setting CPLEX branch-and-bound algorithm is as close as possible to our branch-and-bound algorithm.

**CXLP: **

CPLEX branch-and-bound algorithm with LP outer approximations, setting the branching rule to maximum infeasibility, the node selection rule to best bound, and disabling presolve, cuts and heuristics. In this setting CPLEX branch-and-bound algorithm is as close as possible to our branch-and-bound algorithm.

**CXLPE: **

CPLEX branch-and-bound algorithm with LP outer approximations, setting the branching rule to maximum infeasibility, the node selection rule to best bound, and disabling cuts and heuristic. Since presolve is activated, CPLEX uses extended formulations described in [40]. Besides presolve, all other parameters are set as in CXLP.

**CXD: **

CPLEX default branch-and-bound algorithm with LP outer approximations. This algorithm utilizes all sophisticated features of CPLEX, such as presolver, cutting planes, heuristics, advanced branching and node selection rules.

The time limit is set to two hours for each algorithm.

Table 5 presents the results for discrete cardinality instances with 200 variables and Table 6 for the discrete path instances with 1,740 variables ( $30\times 30$ grid). Each row represents the average over five instances with varying rank and density parameters, and algorithm. The tables show the solution time in seconds, the number of nodes explored in the branch-and-bound tree, the end gap after two hours as percentage, and the number of instances that are solved to optimality for varying values of $\Omega$ . For each instance class we highlight in bold the algorithm with the best performance. Figure 3 shows, for each instance class, the total number of instances solved within given time limits for BBA1, CXLPE and CXD.

First of all, observe that the difficulty of the instances increases considerably for higher values of $\Omega$ due to higher integrality gap. The problems corresponding to high values of the density parameter $\alpha$ are also more challenging.

Performance of CPLEX branch-and-bound

Among CPLEX branch-and-bound algorithms, CXD is the best choice when $\Omega\geq 2$ . Configuration CXD is much more sophisticated than the other configurations, so a better performance is expected. However, note that for $\Omega=1$ configuration CXD is not necessarily the best. In particular in the path instances (Table 6) CXLP and CXLPE are 2.3 times faster than CXD. This result suggests that in simple instances the additional features used by CXD (e.g. cutting planes and heuristics) may be hurting the performance.

The extended formulations result in much stronger relaxations in LP based branch-and-bound and, consequently, the number of branch-and-bound nodes required with CXLPE is only a small fraction of the number of nodes required with CXLP. However, CXLPE requires more time to solve each branch-and-bound node, due to the higher number of variables and the additional effort needed to refine the LP outer approximations. For the cardinality instances, CXLPE is definitely the better choice and is faster by orders of magnitude. For the path instances, however, CXLP is not necessarily inferior: when $\Omega=1$ CXLP is competitive with CXLPE, and when $\Omega=3$ CXLP performs better.

The barrier-based branch-and-bound CXBR, in general, performs poorly. For the cardinality instances, it outperforms CXLP but is slower than the other algorithms. For the path instances it has the worst performance, often struggling to find even a single feasible solution (resulting in infinite end gaps).

Performance of BBA1

Note that BBA1, BBA2 and BBBR are very simple and differ only by the convex node solver. BBA1 is faster than BBBR by an order of magnitude. BBA1 is also two to three times faster than BBA2, despite the fact that Algorithm 2 is faster for convex problems. This improvement is due to the warm start capabilities of Algorithm 1. BBA1 is considerably faster than the simplest CPLEX branch-and-bound algorithms CXBR and CXLP.

We see that BBA1 consistently outperforms CXLPE (which uses presolve and extended formulations), often by many factors. In fact, BBA1 resulted in better performance (faster solution times or lower end gaps) than CXLPE in every instance. Observe that in the cardinality instances with $\Omega=1,2$ and path instances with $\Omega=1$ , BBA1 requires half the number of nodes (or less) compared to CXLPE to solve the instances to optimality (since the relaxations solved at each node are stronger), which translates into faster overall solution times. In the more difficult instances BBA1 is able to solve more instances to optimality, and the end gaps are smaller.

Despite the fact that BBA1 is a rudimentary branch-and-bound implementation, it is faster than default CPLEX in most of the cases. Indeed, BBA1 outperforms CXD in 143 out of 160 instances tested. Figure 3 clearly shows that BBA1 solves more instances faster compared to CXLPE and CXD.

Warm starts

To quantify the impact of warm starts, we plot in Figure 4 the time per node (computed as solution time divided by the number of branch-and-bound nodes) for BBA1, BBA2, BBBR and CXLPE, and also plot the solution time for the corresponding convex instances with solvers ALG1, ALG2, and BAR111The time per node is similar for all combinations of parameters $\Omega$ , $r$ and $\alpha$ . We plot the average for all instances with $\Omega=2$ ..

For the small cardinality instances with 200 variables, all three algorithms perform similarly for the convex instances; however, Algorithm 1 is 15 times faster than barrier and more than three times faster than Algorithm 2 when used in branch-and-bound due to the node warm starts from dual feasible solutions. For the larger path instances with 1,740 variables, Algorithm 1 is again close to three times faster than Algorithm 2 and almost 20 times faster than barrier in discrete instances. Finally, observe that the solve time per node for BBA1 is smaller compared to CXLPE: the proposed simplex-based algorithm is thus as effective as the simplex method for extended formulations in exploiting warm starts. Moreover, it solves the nonlinear convex relaxations at each node to optimality, whereas CXLPE solves its LP relaxation. The improved lower bounds lead to significantly small search trees.

We conclude that Algorithm 1 is indeed suitable for branch-and-bound algorithms since it benefits from node warms starts from the parent nodes, resulting in a significant improvement in solution times.

5. Conclusions

We consider minimization problems with a conic quadratic objective and linear constraints, which are natural generalizations of linear programming and quadratic programming. Using the perspective function we reformulate the objective and propose simplex QP-based algorithms that solve a quadratic program at each iteration. Computational experiments indicate that the proposed algorithms are faster than interior point methods by orders of magnitude, scale better with the dimension of the problem, return higher precision solutions, and, most importantly, are amenable to warm starts. Therefore, they can be embedded in branch-and-bound algorithms quite effectively.

Acknowledgement

This research is supported, in part, by grant FA9550-10-1-0168 from the Office of the Assistant Secretary of Defense for Research and Engineering.

Appendix A Branch-and-bound algorithm

Algorithm 3 describes the branch-and-bound algorithm used in computations. Throughout the algorithm, we maintain a list $L$ of the nodes to be processed. Each node is a tuple $(S,B,lb)$ , where $S$ is the subproblem, $B$ is a basis for warm starting the continuous solver and $lb$ is a lower bound on the objective value of $S$ . In line 5 list $L$ is initialized with the root node. For each node, the algorithm calls a continuous solver (line 11) which returns a tuple $(x,\bar{B},z)$ , where $x$ is an optimal solution of $S$ , $\bar{B}$ is the corresponding optimal basis and $z$ is the optimal objective value (or $\infty$ if $S$ is infeasible). The algorithm then checks whether the node can be pruned (lines 12-13), $x$ is integer (lines 14-17), or it further branching is needed (lines 18-20).

We now describe the specific implementations of the different subroutines. For branching (line 19) we use the maximum infeasibility rule, which chooses the variable $x_{i}$ with value $v_{i}$ furtherest from an integer (ties broken arbitrarily). The subproblems $S_{\leq}$ and $S_{\geq}$ in line 20 are created by imposing the constraints $x_{i}\leq\lfloor v_{i}\rfloor$ and $x_{i}\geq\lceil v_{i}\rceil$ , respectively. The PULL routine in line 7 chooses, when possible, the child of the previous node which violates the bound constraint by the least amount, and chooses the node with the smallest lower bound when the previous node has no child nodes. The list $L$ is thus implemented as a sorted list ordered by the bounds, so that the PULL operation is done in $O(1)$ and the insertion is done in $O(\log|L|)$ (note that in line 20 we only add to the list the node that is not to be processed immediately). A solution $x$ is assumed to be integer (line 14) when the values of all variables are within $10^{-5}$ of an integer. Finally, the algorithm is terminated when $\frac{ub-lb_{best}}{\left|lb_{best}+10^{-10}\right|}\leq 10^{-4}$ , where $lb_{best}$ is the minimum lower bound among all the nodes in the tree.

The maximum infeasibility rule is chosen due to its simplicity. The other rules and parameters correspond to the ones used in CPLEX branch-and-bound algorithm in default configuration.

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Ahmed and Atamtürk [2011] S. Ahmed and A. Atamtürk. Maximizing a class of submodular utility functions. Mathematical Programming , 128:149–169, 2011.
2Alizadeh [1995] F. Alizadeh. Interior point methods in semidefinite programming with applications to combinatorial optimization. SIAM Journal on Optimization , 5:13–51, 1995.
3Alizadeh and Goldfarb [2003] F. Alizadeh and D. Goldfarb. Second-order cone programming. Mathematical Programming , 95:3–51, 2003.
4Atamtürk and Goméz [2016] A. Atamtürk and A. Goméz. Submodularity in conic quadratic mixed 0-1 optimization. ar Xiv preprint ar Xiv:1705.05918 , 2016. BCOL Research Report 16.02, UC Berkeley.
5Atamtürk and Jeon [2017] A. Atamtürk and H. Jeon. Lifted polymatroid inequalities for mean-risk optimization with indicator variables. ar Xiv preprint ar Xiv:1705.05915 , 2017. BCOL Research Report 17.01, UC Berkeley.
6Atamtürk and Narayanan [2007] A. Atamtürk and V. Narayanan. Cuts for conic mixed-integer programming. In Matteo Fischetti and David P. Williamson, editors, Integer Programming and Combinatorial Optimization , pages 16–29, Berlin, Heidelberg, 2007. Springer. ISBN 978-3-540-72792-7.
7Atamtürk and Narayanan [2008] A. Atamtürk and V. Narayanan. Polymatroids and risk minimization in discrete optimization. Operations Research Letters , 36:618–622, 2008.
8Atamtürk and Narayanan [2009] A. Atamtürk and V. Narayanan. The submodular 0-1 knapsack polytope. Discrete Optimization , 6:333–344, 2009.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Simplex QP-based methods for minimizing

Abstract.

1. Introduction

2. Formulation

Proposition 1**.**

Proof.

Proposition 2**.**

Proof.

3. Algorithms

3.1. Coordinate descent algorithm

Proposition 3** (Dual feasibility bound).**

Proof.

Proposition 4** (Monotonicity).**

Proof.

Corollary 1** (Convergence to a KKT point).**

Proof.

Corollary 2** (Convergence to [math]).**

Proof.

Initialization.

Stopping condition.

Remark 1*.*

3.2. Bisection algorithm

Initialization.

Stopping condition.

3.3. Warm starts

3.3.1. Warm starts with primal simplex for convex optimization

3.3.2. Warm starts with dual simplex for discrete optimization

3.4. Special cases

4. Computational experiments

4.1. Test problems

4.2. Experiments with convex problems

Optimality tolerance

Effect of the nonlinearity parameter Ω\OmegaΩ.

Effect of the dimension

4.3. Discrete instances

Performance of CPLEX branch-and-bound

Performance of BBA1

Warm starts

5. Conclusions

Acknowledgement

Appendix A Branch-and-bound algorithm

Proposition 1.

Proposition 2.

Proposition 3 (Dual feasibility bound).

Proposition 4 (Monotonicity).

Corollary 1 (Convergence to a KKT point).

Corollary 2 (Convergence to [math]).

*Remark 1**.*

Effect of the nonlinearity parameter $\Omega$ .