This paper introduces a new affine scaling interior point algorithm for linear programming that leverages a broad class of differential barrier functions, demonstrating robustness and efficiency over classical methods.
Contribution
The paper presents a novel affine scaling algorithm based on differential barrier functions, expanding the toolkit for linear programming optimization.
Findings
01
The proposed algorithm is robust and efficient.
02
Differential barrier functions offer new perspectives in linear optimization.
03
Comparison shows advantages over classical affine scaling methods.
Abstract
In this paper we address a practical aspect of differential barrier penalty functions in linear programming. In this respect we propose an affine scaling interior point algorithm based on a large classe of differential barrier functions. The comparison of the algorithm with a vesion of the classical affine scaling algorithm shows that the algorithm is robust and efficient. We thus show that differential barrier functions open up new perspectives in linear optimization.
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Optimization Algorithms Research · Optimization and Search Problems · Metaheuristic Optimization Algorithms Research
Full text
∎
11institutetext: Abdessamad BARBARA22institutetext: Institut de Mathématiques de Bourgogne(IMB)- UMR 5584 CNRS
An affine scaling method using a class of differential barrier functions
Abdessamad BARBARA
Abstract
In this paper we address a practical aspect of differential barrier penalty functions in linear programming. In this respect we propose an affine scaling interior point algorithm based on a large classe of differential barrier functions. The comparison of the algorithm with a vesion of the classical affine scaling algorithm shows that the algorithm is robust and efficient. We thus show that differential barrier functions open up new perspectives in linear optimization.
Key words: Barrier, concave gauge, differential barrier, interior point methods, linear programs, primal algorithm.
In this paper we present an algorithm based on a family of penalty functions introduced in [1]. Contrary to the classical logarithmic barrier function, these functions are not necessarily barriers, since they can be well defined on the positive orthant including its boundary. But they are differentially barriers. In fact, these functions generalize the notion of barrier functions since (Proposition 17 of [1]) a barrier function is in particular a differential barrier one.
We recall that (Definition 1 of [1]) a function F is said to be a differential barrier on the positive orthant P=[0,+∞)n if F is differentiable on (0,+∞)n and x>0x→x′limsup∣∣∇F(x)∣∣=+∞, for every x′ being on the boundary of P. So ∇F plays the role of a barrier.
Also, the fact that a method based on the minimization of a penalty function is of interior points type is closely related to the following
property.
Let F be a convex, lower semi-continuous and differential barrier function on
P. Then every optimal solution x to the problem
[TABLE]
is an interior point of the
positive orthant.
Through the example of the concave gauge functions111A background on concave gauge functions is given in [1] and a complete description is done in [2] we will consider, we will show the important role that penalty functions of the differential barrier type can play as an alternative to the classical logarithmic barrier function.
In this respect we consider the familiy of differential barrier functions builded from the following concave gauge functions:
[TABLE]
where r is taken arbitrary in (0,1). To be more precise,
let us consider the linear program given by
\hfillmin{⟨c,x⟩:Ax=b,x∈[0,+∞)n}\hfill(LP)
where A is an m×n matrix of rank m, c, x∈Rn and b∈Rm. By definition of a concave gauge function the positive cone can be expressed as
[TABLE]
Hence the original linear program can be equivalently rewritten as
[TABLE]
Applying the approach developed in [1], we propose to penalize the constraint ξr(x)≥0 by the functions
[TABLE]
So the nonlinear optimization problem approximating the linear program222We recall that the idea to approximate a linear program by a nonlinear optimization problem is du originally to Courant [3] in 1941 with a penalty function of exterior type and later to Frisch [4], in 1955, when he introduced the logarithmic barrier function which is an interior penalty one. We recall also that the notion of interior penalty operators were introduced by Auslender [5] in 1976 to generalize the concept of barrier functions. is as follow
\hfillmin{Fr,μ(x):Ax=b}\hfill(Pμ,r)μ>0
where
[TABLE]
It is easy to see that Fr,μ is a differential barrier function and then \Big{(}Proposition 1\Big{)} the optimal solution of (Pμ,r) belongs to the interior of the positive orthant.
The algorithm we build, called galpv4, is of primal type and uses an affine scaling
approach333An affine scaling algorithm was originally proposed by Dikin in 1967 [6]. It was rediscovered by several researchers such as barnes [7] and Vanderbei et al [8] after Karmarkar [9] proposed his famous projective scaling algorithm.. It consists of two combined phases. The first one improves the feasibility of the current point and the second brings the point closer to an optimal solution. At each iteration, this requires the computation of two directions. The direction dk, bringing a current point xk closer to the optimal solutions’ set is obtained as follows. We compute at xk the Newton direction dk(μ) for the problem (Pμ,r). Vector dk is then the part of the expression of dk(μ), independent of parameter μ and satisfies δμdk(μ)=dk+O(μ), where δμ is a positive real function of μ. That is dk=μ↓0limδμdk(μ). The direction d′k that improves the feasibility of the current point is obtained by using the same process with the linear program
[TABLE]
We show that the sequence (xk) converges. Its limit is an interior point of the optimal solutions’ face of the linear program when β∈(0,32), where β is the factor of the maximal step size with respect to xk and dk. Moreover, by calculating dk, the algorithm generates a sequence (yk,sk) that converges to the ξr⊕-analytic center of the dual optimal solutions’ face, where ξr⊕ is the polar concave gauge function (see [2]) of ξr.
In Proposition 2 of Section 2, we show that galpv4 includs the classical affine scaling approach by setting r=0. In this respect, we compare the algorithm’s performances between different values of r∈[0,1) through numerical experiments using the familiar netlib test set [10].
The paper is organized as follows. In section 2 we present the algorithm, the computation of an affine scaling direction and how to find approximately a relative interior feasible solution. Section 3 deals with the convergence results and a stopping criteria, followed by numerical results and comments in Section 4. Finally, we close the paper by some concluding remarks in Section 5.
2 Presentation of a primal affine scaling method
In order to take account of possible bound constraints, we consider in all the following, the linear program
\hfillmin{⟨c,x⟩:Ax=b,x≥0,xi≤uii∈I}\hfill(LPB)
where I is a subset to {1,2,⋯,n} and u∈Rn is given such that ui>0 if i∈I and ui=+∞ if not. It is easy to see that the dual problem of (LPB) is
where I={1,2,⋯,n}−I. Moreover if (x,y,s,w) is a primal-dual optimal solution then it is easy to see from the KKT optimality conditions that
[TABLE]
where UI=diag(uI) and XI=diag(xI). We assume that there exists a relative interior feasible solution to (LPB) and that the minimum is finite. Hence the optimal solutions’ set of (LPB) and of (LDB) are non empty, and there is no duality gap.
Moreover the set of optimal solutions to (LDB) is compact.
Now taking account of the slack variables ui−xi, we adapt the definition of ξr as
[TABLE]
The following algorithm, called galpv4, uses an approach based on a version of the classical affine scaling algorithm presented in [7, 8, 11].
Algorithm galpv4
Initialization
Construct a starting point x as described just bellow and choose r∈[0,1).
Compute y, w, s according to (8), (9) and (10) respectively.
Compute the expected relative duality gap Rgap according to (11)
Set the feasibility measure Rf←∥b∥∞+1∥Ax−b∥∞
Choose ϵ>0 a stopping rule parameter.
Whilemin(Rf,Rgap)>ϵdo
Compute dx the feasible direction according to (7)
Set tmax←min(dxi<0min−dxixi,dxi>0,i∈Imin−dxiui−xi,1)
If Rf>ϵ then t←0.95tmax else set t←0.65tmax
Set x←x+tdx
Set tmax←min(di<0min−dixi,di>0,i∈Imin−diui−xi)
If Rf>ϵ then t←0.65tmax else set t←0.95tmax
Update x←x+td
Update y, w, s according to (8), (9) and (10) respectively.
Update the expected relative duality gap Rgap according to (11)
Update Rf←∥b∥∞+1∥Ax−b∥∞
End while
Let us describe now how to construct, empirically, a starting point. In fact we construct two starting points x1 and x2.
The first one is defined as follow. \mboxForj=1,2,..,n, xj1=min(∥Aj.∥n,0.9uj) if cj<0 and xj1=min(∥Aj.∥n,0.1uj) otherwise, where Aj. is the jth column of matrix A. The second one is defined as in the routine pcinit.f of the software HOPDM of Gondzio [12]. Our starting point x is chosing as follow. If minxi2>jminxj1 or jminxj1<1 then we set x=x2 else we set x=x1.
Note that the algorithm can be extended to the case r=0. It is justified by the following proposition.
Proposition 2
Set nI=card(I) and define ξ0 as
[TABLE]
For any r∈(−∞,0)∪(0,1) we set ξ~r=(n+nI)r11ξr. Then for every x∈(0,+∞)n,
Proof.(i). Without loss of generality we can assume that I=∅. Let x∈(0,+∞)n and set
ψ1(r)=ln(n1Σxir) and ψ2(r)=r.
We have
r→0limψ1(r)=r→0limψ2(r)=0 and ψ2′(r)=1=0.
Then by the classical Hôpital theorem
r→0limlnξ~r(x)=r→0limψ2(r)ψ1(r)=r→0limψ2′(r)ψ1′(r),
but
ψ1′(r)=i=1∑nxiri=1∑nxirlnxi.
The result follows.
(ii) and (iii). Using (i) and the expressions of ∇ξ~r and ∇2ξ~r, it is easy to see that
r→0lim∇ξ~r(x)=∇n1ξ0(x) and r→0lim∇2ξ~r(x)=∇2n1ξ0(x).
∎
2.1 Finding a descent direction
Let x be a relative interior feasible point to (LPB), μ>0 and r∈(0,1). The Newton direction at x to the penalized problem (Pμ,r) is obtained by solving the minimization problem
[TABLE]
Using
the KKT optimality conditions, the problem amounts to finding d(μ)∈Rn and y∈Rm solutions to the system of linear
equations
[TABLE]
We have
∇Fr,μ(x)=c−μGe\mboxand∇2Fr,μ(x)=μ(1−r)H
where et=(1,1,..,1)∈Rn,
G and H are diagonal matrices defined respectively by
[TABLE]
and
[TABLE]
Then setting
[TABLE]
the projection matrix on the kernel of AH−21, system
(2) reduces to
PH−21∇Fr,μ(x)+μ(1−r)H21d(μ)=0
and then
μ(1−r)d(μ)=−H−21PH−21∇Fr,μ(x)=−H−21PH−21c+μH−21PH−21Ge.
Since μ(1−r)>0 we can so take as an affine scaling direction at x to the linear program vector d given by
[TABLE]
Observe that since ⟨c,d⟩=−∥PH−21c∥22<0, d is a descent direction to the linear program at every point to Rn.
Remark: To improve the quality of the direction d, in order to maintain a good feasibility to the current point, we can compute in addition the direction H−21PH21d which can be used instead of d. Which in fact amounts to projecting a second time the direction H21d onto the vector subspace kerAH−21. Of course, theoretically the two directions are identical, but numerically there is a significant difference. However the computation of H−21PH21d has some extra cost in number of operations. Therefore we use the technic only when the relative duality gap is less than 0.001 or the current number of iterations exceeds 20.
2.2 Finding a feasible solution
It is well known that an approximate relative interior feasible solution to (LPB) can be obtained by solving a linear problem
of the form
where x0 is a point arbitrarily chosen in (0,+∞)n.
Write (FLP) as
[TABLE]
where {\tilde{x}}=\left(\begin{array}[]{l}x\cr\lambda\end{array}\right),\ {\tilde{c}}=\left(\begin{array}[]{l}0_{\mathbb{R}^{n}}\cr 1\end{array}\right)\mbox{ and
}{\tilde{A}}=\left(\matrix{A&b-Ax^{0}\cr}\right).
Then using (6), the affine scaling direction d~ with respect to x~ is given by
d~=−H~−21P~H~−21c~
where {\tilde{H}}=\left(\matrix{H&0\cr 0&\lambda^{r-2}\cr}\right)\mbox{ and }{\tilde{P}}=I-{\tilde{H}}^{-{1\over 2}}{\tilde{A}}^{t}\left({\tilde{A}}{\tilde{H}}^{-1}{\tilde{A}}^{t}\right)^{-1}{\tilde{A}}{\tilde{H}}^{-{1\over 2}}.
But the matrix A~H~−1A~t will be
generally dense when there is one dense column in A~. Column
b−Ax0, in most cases, is dense. So for large-scale applications,
we split such column from the others. We proceed as follow. Set
v=b−Ax0. Then A~H~−1A~t=AH−1At+λ2−rvvt. Using the Sherman-Morrison
formula we have (A~H~−1A~t)−1=(AH−1At)−1+δλ2−rwwt, where
w=(AH−1At)−1v
and δ=1+λ2−r⟨w,v⟩−1. So
[TABLE]
It follows that
{\tilde{d}}=-\delta\lambda^{2-r}\left(\matrix{H^{-1}A^{t}w\cr\cr&\cr-1\cr}\right).
Since −δλ2−r>0 the search directions with respect to x and λ can be expressed respectively as
dx=−H−1At(AH−1At)−1(b−Ax0)\mboxanddλ=−1.
But numerical experiments show that as iterations go, the
constraint Ax+λ(b−Ax0)=b is less and less satisfied. This is due to the rounding off errors generated by the projection onto kerA~ at each iteration and thus creating a snowball effect. To work around this problem, we
proceed as follows: Let xk be a current point in the
feasibility searching phase. Then \left(\begin{array}[]{l}x^{k}\cr 1\end{array}\right) is a feasible point of
problem
[TABLE]
In this case the search direction with respect to xk is
[TABLE]
It follows that the point
\left(\begin{array}[]{l}x^{k+1}\cr\lambda^{k+1}\end{array}\right)=\left(\begin{array}[]{l}x^{k}\cr 1\end{array}\right)+t^{k}\left(\begin{array}[]{l}{d_{x}^{k}}\cr-1\end{array}\right) for a step size
tk suitably chosen, does not suffer from the snowball effect mentioned above.
Remark: To compute (AH−1At)−1(b−Ax) and PH−21c we solve for w and Δ by Cholesky factorization the linear systems
AH−1Atw=b−Ax and (AH−1At)Δ=H−21c and then we compute
PH−21c=H−21c−H−21AtΔ.
3 Convergence, dual solution and stopping criteria
Without loss of generality we can assume in this section that I=∅. In this case
H=Xr−2\mbox,P=I−X1−2rAt(AX2−rAt)−1AX1−2r,
where X=diag(x) and x∈(0,+∞)n. To simplify we assume that the starting point x0 is a relative interior feasible solution to the linear program. So we consider (xk)k∈N the sequence defined by
xk+1=xk+βtmaxkdk, where β∈(0,1) and tmaxk is the maximum step length with
respect to xk and dk=−Xk1−2rPXk1−2rc=−Xk2−rc+Xk2−rAt(AXk2−rAt)−1AXk2−rc. Set yk=(AXk2−rAt)−1AXk2−rc\mboxandsk=c−Atyk. Here is our main result.
Theorem 3.1
Assume that 0<β<32. Then (xk,yk,sk)k∈N converges to (x,y,s), where (y,s) is the ξt-analytic center to the dual optimal face of the linear program, t is such that t1+r1=1 and x belongs to the relative interior of the primal optimal face of the linear program.
Before giving the proof of the theorem, we first establish some preliminary results.
Proposition 3
k=0∑∞βtmaxkPXk1−2rc22* is a converging series.*
Proof. We have ⟨c,xk+1⟩−⟨c,xk⟩=βtmaxk⟨c,dk⟩=−βtmaxkPXk1−2rc22.
The sequence (⟨c,xk⟩)k∈N
is then decreasing. Since we assumed that the optimal value of the linear program is finite, the sequence is bounded and then converges. Set c its limit.
Then we have k=0∑∞βtmaxkPXk1−2rc22=⟨c,x0⟩−c<+∞. The result then follows.
∎
Now let us recall an important result. It was proved by Monteiro et al. [13], Saigal [11], Tseng and Luo [14] and Tsuchiya [15].
Theorem 3.2
There exists a constant L(A,c)>0 such that every optimal solution w to the following ellipsoidal problem
\hfillmax{⟨c,w⟩:Aw=0,X−1w22≤1}\hfill(EP)**
satisfies ∥w∥2≤L(A,c)⟨c,w⟩.
Corollary 1
Let x∈(0,+∞)n. Then X1−2rPX1−2rc satisfies
[TABLE]
Proof. First observe that PX1−2rc2X1−2rPX1−2rc can be viewed as the optimal solution to the following ellipsoidal problem
\hfillmax{⟨c,w⟩:Aw=0,X2r−1w22≤1}\hfill(EPr)
Hence using Theorem 3.2 by considering X~=X1−2r instead of X, the result follows.
∎
Proposition 4
(xk)k∈N* is a converging sequence, say to x. Furthermore, for each k∈N, xk−x2≤h(⟨c,xk⟩−c), where h=L(A,c)1.*
Proof. By Corollary 1 we have
⟨c,xk⟩−⟨c,xk+1⟩=−βtmaxk⟨c,dk⟩≥L(A,c)βtmaxkdk2=L(A,c)xk+1−xk2.
It follows that
+∞>⟨c,x0⟩−c=0≤k<+∞∑⟨c,xk−xk+1⟩≥L(A,c)0≤k<+∞∑xk+1−xk2. Thus (xk)k∈N converges.
Now using again Corollary 1 we have
⟨c,xk⟩−c=j=0∑∞⟨c,xk+j−xk+j+1⟩≥h1j=0∑∞xk+j−xk+j+12≥h1j=0∑∞xk+j−xk+j+12=h1xk−x2.
The result follows.
∎
Now we recall the next theorem proved by Dikin [16]. A proof can also be found in [11, 17, 18, 19].
Theorem 3.3
For every x>0 and for every p∈Rn, we have
[TABLE]
where q(A) is a constant only function of A.
Proposition 5
The sequences (yk) and (sk) are bounded.
Proof
According to Theorem 3.3, for every x>0 and for every p∈Rn, we have ∥yk∥2=(AXk2−rAt)−1AXk2−rc2=(A(Xk1−2r)2At)−1A(Xk1−2r)2c2≤q(A)∥c∥2 and then ∥sk∥2=∥c−Atyk∥2≤(1+q(A)∥A∥2)∥c∥2. The result then follows. ∎
Let us now consider the following notation. Given x∈(0,+∞)n and s∈Rn we set Ir(x,s)={i:xi1−r∣si∣=∥X1−rs∥∞}.
Lemma 1
Let (x,s)∈(0,+∞)n×Rn be such that Xs=0. One has for every (r,r′)∈[0,1]2, if r<r′ then xir≥xir′ and ∣sir∣≤∣sir′∣, ∀(ir,ir′)∈Ir(x,s)×Ir′(x,s).
Proof. We have
[TABLE]
and
[TABLE]
Multiplying side by side (1) and (2) one has
0<xir′1−rxir1−r′∣sir∣∣sir′∣≤xir1−rxir′1−r′∣sir′∣∣sir∣.
That is xir′r′−r≤xirr′−r and then xir′≤xir. Now using (2) one has
0<xir′1−r′∣sir∣≤xir1−r′∣sir∣≤∥X1−r′s∥∞=xir′1−r′∣sir′∣. The result then follows. ∎
There is h~>0 such that ∥xJk−xJ∥2≤h~∥xIk∥2,∀k∈N.
Proof.
Let (y,s) be an accumulation point of (yk,sk). The existence of (y,s) is ensured by Proposition 5. Using Proposition 4 we have
∥xk−x∥22=∥xIk∥22+∥xJk−xJ∥22≤h2⟨c,xk−x⟩2=h2⟨s,xk−x⟩2=h2⟨sI,xIk⟩2≤h2∥sI∥22∥xIk∥22
and then ∥xJk−xJ∥22≤(h2∥sI∥22−1)∥xIk∥22. Thus h2∥sI∥22−1 is necessarily nonnegative. The result then follows by setting h~=h2∥sI∥22−1. ∎
N.B: The fact that h=L(A,c)1, h2∥sI∥22−1≥0 also means that ∥s∥2≥L(A,c), for every s be an accumulation point of (sk).
In all the following we set
g=k∈Nsup∥sk∥∞, M=ksup∥xk∥∞<+∞ and M=kinfj∈Jminxjk. Note that since k↑+∞limxJk=xJ>0, M>0 and that according to Proposition 5g<+∞.
Proposition 6
Let β∈(0,1). Then there exists K∈N such that
i)* ∀k≥K, ∀r∈(0,1), ∀ir∈Ir(xk,sk), xirk=O(∥xIk∥∞) and sirk=O(1). Furthermore ∥Xk1−rsk∥∞=∥Xk,I1−rsIk∥∞=O(∥xI∥∞1−r) and there exists a constant C^ such that sJk2≤C^∥xI∥∞2−r.*
ii)* ∀k≥K, ⟨c,xk+1⟩−c≤L(⟨c,xk⟩−c),
where L=1−β22−r3−rg2−r6−2rnI27−2(2−r)r(1−r)L(A,c)2−r6−r.*
iii)* k=0∑∞∥xIk∥∞a<+∞,∀a>0.*
iv)* ⟨xIk,sIk⟩=O(∥xIk∥∞).*
v)* ⟨c,xk⟩−c=O(∥xIk∥∞) and ∥xk−x∥2=O(∥xIk∥∞).*
Proof.i) and ii) We have
⟨c,xk+1⟩−c=⟨c,xk⟩−c−βtmaxPXk1−2rc22=⟨c,xk⟩−c−βtmaxXk1−2rsk22≤⟨c,xk⟩−c−βtmaxXk1−2rsk∞2.
Now tmaxk=min{−dikxik:dik<0}≥∥Xk−1dk∥∞1=∥Xk1−rsk∥∞1. Then ⟨c,xk+1⟩−c≤⟨c,xk⟩−c−βXk1−rsk∞Xk1−2rsk∞2. Let (i2r,ir)∈I2r(xk,sk)×Ir(xk,sk). From Lemma 1 one has xi2rk≥xirk and then
Using Proposition 4 , the fact that PXk1−2rc2Xk1−2rPXk1−2rc is the optimal solution to (EPr) and Xk2r−1(xk−x)2xk−x is a feasible solution of (EPr),
L(A,c)Xk2r−1(xk−x)2∥xk−x∥2≤Xk2r−1(xk−x)2⟨c,xk−x⟩≤PXk1−2rc2⟨c,Xk1−2rPXk1−2rc⟩=PXk1−2rc2=∥Xk1−2rsk∥2.
It follows that
L(A,c)2XkI2reI22+XkJ2r−1(xJk−xJ)22∥xIk∥22+∥xJk−xJ∥22≤∥Xk1−2rsk∥22. Now using Lemma 2, the fact that r∈(0,1) and the fact that k→∞limxIk=0, for k being large enough one has
XkJ2r−1(xJk−xJ)22≤Mr−2h~2∥xIk∥22=Mr−2h~2∥XkI1−2rXkI2reI∥22≤Mr−2h~2∥xIk∥∞2−rXkI2reI22≤XkI2reI22, where eI is the vector of RnI whose components are equal to 1.
According to iii) of Proposition 4.3 in [2] one has
(i∈I∑xikr)r2nI1−r2=(i∈I∑(xik2)2r)r2nI1−r2=ξ2r,I(xIk)ξr−2r,I(eI)≤⟨xIk,eI⟩=i∈I∑xik2=∥xIk∥22,
where ξ2r,I and ξr−2r,I are the concave gauge functions respectively defined by
\xi_{{r\over 2},I}(z)=\left\{\begin{array}[]{ll}\left(\sum\limits_{i\in I}z_{i}^{r\over 2}\right)^{2\over r}&\mbox{if }z\in[0,+\infty)^{n_{I}},\cr-\infty&\mbox{elsewhere}\end{array}\right. and \xi_{{r\over r-2},I}(z)=\left\{\begin{array}[]{ll}\left(\sum\limits_{i\in I}z_{i}^{r\over r-2}\right)^{r-2\over r}&\mbox{if }z\in(0,+\infty)^{n_{I}},\cr 0&\mbox{if }z\in\partial[0,+\infty)^{n_{I}},\cr-\infty&\mbox{elsewhere.}\end{array}\right.
Here ∂[0,+∞)nI denotes the boundary of [0,+∞)nI.
That is
i∈I∑xikr=∥XkI2reI∥22=i∈I∑xikr≤nI1−2r∥xIk∥2r.
Hence
2nI1−2rL(A,c)2∥xI∥22−r≤L(A,c)2XkI2reI22+XkJ2r−1(xJk−xJ)22∥xIk∥22+∥xJk−xJ∥22≤∥Xk1−2rsk∥22
and then
on the other hand. Since k↑∞limxIk=0, by (2bis) we have necessarily Xk1−2rsk∞=Xk,I1−2rsIk∞=xi2rk1−2rsi2rk, for k large enough. Now Xk,J1−rsJk2≤M−2rXk,J1−2rsJk2
and Xk,I1−2rsIk2=Xk,I2rXk,I1−rsIk2≤∥xIk∥∞2rXk,I1−rsIk2.
Then using (29bis) we get Xk,J1−rsJk2≤ML(A,c)nI∥sI∥∞∥xIk∥∞Xk,I1−rsIk2.
Hence using again the fact that k↑∞limxIk=0 we get Xk1−rsk∞=Xk,I1−rsIk∞ for k large enough.
Turn back now to (2). Then when k is large enough we have
and then xi2rk=O(xIk2) and then i2r∈I. Now xirk1−rO(1)=xirk1−rsirk=imax(xik1−rsik)≥xi2rk1−rsi2rk=O(xIk2)1−r. It follows that ∥Xk,I1−rsIk∥∞=O(∥xIk∥∞1−r) and xirk=O(xIk2), witch implies that ir∈I.
Now using (5), (6) and the fact that ∣sirk∣≤∥sk∥∞≤g we get
xi2rksirksi2rk2≥22−r3−rg2−r4−rnI3−2(2−r)r(1−r)L(A,c)2−r6−rxIk2.
But
⟨c,xk⟩−x=⟨s,xk−x⟩=⟨sI,xIk⟩≤∥sI∥2∥xIk∥2≤nIg∥xIk∥2.
It follows that
xi2rksirksi2rk2≥22−r3−rg2−r6−2rnI27−2(2−r)r(1−r)L(A,c)2−r6−r(⟨c,xk⟩−c)
and then by (1),
⟨c,xk+1⟩−c≤L(⟨c,xk⟩−c).
iii) By ii) and Proposition 4,
∥xIk∥∞≤xk−x2≤H(⟨c,xk⟩−c)≤HLk−K(⟨c,xK⟩−c),∀k≥K.
The result follows since
[TABLE]
iv) We have L(A,c)∥xIk∥2≤L(A,c)∥xk−x∥2≤⟨c,xk⟩−c=⟨sk,xk−x⟩=⟨sIk,xIk⟩+⟨sJk,xJk−xJ⟩=⟨sI,xIk⟩≤∥sI∥2∥xIk∥2. The result then follows from i).
v) Let (y,s) be an accumulation point to (yk,sk). We have ⟨c,xk⟩−c=⟨Aty+s,xk−x⟩=⟨sI,xIk⟩. Using Proposition 4 and the Cauchy-Schwartz inequality we get
[TABLE]
The result then follows.
∎
Now we establish some technical results given by Saigal [11] in the classical case.
Proposition 7
Let (uk) the sequence defined by
uk=⟨c,xk⟩−cXk2rPXk1−2rc=⟨c,xk⟩−cXksk.
Then we have:
i)* The sequence (uk) is bounded.*
ii)* k=0∑∞uJk2<+∞.*
iii)* k=0∑∞∣δk∣<+∞, where δk=⟨eI,uIk⟩−1.*
Proof.
i) and ii) By Proposition 4, there is h>0 such that
xk−x≤h(⟨c,xk⟩−c). Then ∥uI∥2=⟨c,xk⟩−c∥Xk,IsIk∥2≤h∥xk−x∥2∥Xk,IsIk∥2≤h∥xIk∥∞∥xIk∥∞∥sIk∥2=h∥sIk∥2
Hence uIk is bounded according to Proposition 5. According to Proposition 6∥uJk∥2=⟨c,xk⟩−c∥Xk,JsJk∥2≤h∥xJk∥∞∥xk−x∥2∥sJk∥2≤hM∥xIk∥2∥sJk∥2≤hC^M∥xIk∥1−r.
Then i) and ii) follows by using Proposition 6.
iii) Set SD={(y,s):Aty+s=c,sJ=0} the expected dual optimal solutions’ set. Let (y^k,s^k)
a solution to the problem min{sk−s2:(y,s)∈SD}. We have
⟨c,xk⟩−c=⟨c,xk−x⟩=⟨s^k+Aty^k,xk−x⟩=⟨s^Ik,xIk⟩.
Then ⟨eI,uIk⟩−1=⟨c,xk⟩−c⟨xIk,sIk⟩−⟨s^Ik,xIk⟩≤⟨c,xk⟩−csIk−s^Ik2xIk2. By Proposition 4 we have
xIk2≤xk−x2≤h(⟨c,xk⟩−c). Hence
⟨eI,uIk⟩−12≤hsIk−s^Ik2.
From Theorem 7 of [11], there is M^ such that s^k−sk2≤M^sJk2. Using Proposition 6 we get sIk−s^Ik2≤M^sJk2≤C^M^∥xI∥∞2−r.
The result then follows by using iii) of Proposition 6. ∎
Now let us introduce the potential function F defined as follow:
[TABLE]
The following Proposition holds.
Proposition 8
There is Δ∈R such that for every k≥0, F(xk)≥Δ>−∞.
Proof.
By Theorem 3.1 and Proposition 4.3 of [2], ξ~r(xk)ξ~t(e)≤⟨xIk,eI⟩, where t is such that t1+r1=1. Hence ξ~r(xk)≤nIt11i∈I∑xik≤nIt1n21xIk2=nr1−21xIk2=nr1−21xk−x2≤nr1−21h(⟨c,xk⟩−c) and then −∞<ln(nr1−21h1)≤F(xk). The result then follows. ∎
Proposition 9
Let β∈(0,32). There exists K∈N such that ∀k≥K
[TABLE]
where \Upsilon^{k}=\left\{\begin{array}[]{ll}1&\mbox{if }\max\limits_{i\in I}w_{i}^{k}\leq 0,\cr 0&\mbox{if }\max\limits_{i\in I}w_{i}^{k}>0,\end{array}\right.θk=1−i∈I∑xikr1t~kt~k,
t~k=tk(⟨c,xk⟩−c), tk=βtmaxk,
γk=Xk,I−2ruJk2 and
wIk=Xk,I−ruIk−i∈I∑xikr1e.
Proof.
Let us proof at first that given β∈(0,1), there is K∈N such that ∀k≥K,
F(xk+1)−F(xk)≤ln1−θk∥Xk,I2rwIk∥−2i∈I∑xikrθkδk−θkγk−i∈I∑i∈I∑xikrxikrln(1−θkwik).
We have F(xk+1)−F(xk)=ln(⟨c,xk⟩−c⟨c,xk+1⟩−c)−r1ln(∑xikr∑xik+1r)
,
⟨c,xk+1⟩−c=⟨c,xk⟩−c−tk⟨Xk−rXkSk,Xksk⟩,
uk=⟨c,xk⟩−cXksk and t~k=(⟨c,xk⟩−c)tk. Then
⟨c,xk⟩−c⟨c,xk+1⟩−c=1−t~k⟨Xk,I−ruk,uk⟩=1−t~k⟨Xk,I−ruIk,uIk⟩−tk~γk.
Now uIk=Xk,IrwIk+i∈I∑xikr1Xk,IreI, Xk,I−ruIk=wIk+i∈I∑xikr1eI and δk=⟨uIk,eI⟩−1=⟨Xk,IrwIk,eI⟩. Then ⟨Xk,I−ruIk,uIk⟩=⟨Xk,IrwIk,wIk⟩+i∈I∑xikr2δk+i∈I∑xikr1. It follows that
Let us show now, for β∈(0,1),
θk=1−i∈I∑xikrt~kt~k=i∈I∑xikr−t~ki∈I∑xikrt~k>0, for k large enough.
We have ⟨c,xk⟩−c=⟨sk,xk−x⟩=⟨sIk,xIk⟩+⟨sJk,xJk−xJ⟩.
Using the fact that k↑∞limxk−x=0, from Proposition 6 we get
⟨xIk,sIk⟩∥sJk∥2∥xJk−xJ∥2=o(∥xIk∥1−r). Therefore θk>0.
Now r1lni∈I∑xikri∈I∑xikr+1=r1lni∈I∑j∈I∑xjkrxikr(1−t~kxik−ruik)r. Since the function t↦lnt, t>0, is concave, one has
Hence
F(xk+1)−F(xk)≤ln[1−θk⟨Xk,IrwIk,wIk⟩−i∈I∑xikr2θkδk−θkγk]−i∈I∑j∈I∑xjkrxikrln(1−θkwik).
Assume now that i∈Imaxwik>0. Then using Lemma 8 of [11] or its proof we can easily see
i∈I∑j∈I∑xjkrxikrln(1−θkwik)≥−j∈I∑xjkrθkδk−2j∈I∑xjkrθk21−θki∈Imaxwik∥Xk,I2rwIk∥2.
Using in addition the fact that
ln(1−a)≤−a,∀a<1
we get
F(xk+1)−F(xk)≤−θk1−2j∈I∑xjkrθk1−θki∈Imaxwik1∥Xk,I2rwIk∥22−j∈I∑xjkrθkδk−θkγk.
Now
We have on the one hand θk=1−j∈I∑xjkrt~kt~k and then t~k=1+j∈I∑xjkrθkθk. On the other hand, t~k=βi∈Imax(Xk1−rsk)i⟨c,xk⟩−c=i∈Imax(Xk−ruk)iβ.
It follows that
Using the fact that i∈I∑[xikrj∈Imax(Xk1−rsk)i]≥i∈I∑xikrxik1−rsik=i∈I∑xiksik and the fact that ⟨c,xk⟩−c=⟨sk,xk−x⟩=⟨sIk,xIk⟩+⟨sJk,xJk−xJ⟩ we get
1−2(1−β)βi∈I∑[xikrj∈Imax(Xk1−rsk)i]⟨c,xk⟩−c≥1−2(1−β)β(1+⟨sIk,xIk⟩⟨xJk,sJk⟩)=2(1−β)2−3β−2(1−β)β⟨xIk,sIk⟩⟨sJk,xJk−xJ⟩≥2(1−β)2−3β−2(1−β)β⟨xIk,sIk⟩∥sJk∥2∥xJk−xJ∥2=2(1−β)2−3β+o(∥xIk∥21−r)≥3(1−β)2−3β, for k being large enough.
Hence
F(xk+1)−F(xk)≤−θk3(1−β)2−3β∥Xk,I2rwIk∥22−j∈I∑xkrθkδk−θkγk.
Consider now the case i∈Imaxwik≤0. Then i∈I∑i∈I∑xikrxikrln(1−θkwik)≥0 and the result follows by using the fact that ln(1−a)≤−a,∀a<1. ∎
Lemma 3
We have
j∈I∑xikrθk=1−j∈I∑xikrt~kj∈I∑xikrt~k=O(1),
For k large enough.
Proof. We have nIg∥xIk∥∞≥i∈I∑xikri∈Imax(Xk1−rsk)i≥i∈I∑xikr(Xk1−rsk)i=⟨xIk,sIk⟩. By Proposition 6⟨xIk,sIk⟩=O(∥xIk∥∞) and ⟨c,xk⟩−c=O(∥xIk∥∞). It follows that i∈I∑xikri∈Imax(Xk1−rsk)i=O(∥xIk∥∞) and then i∈I∑xikrt~k=βi∈I∑xikri∈Imax(Xk1−rsk)i⟨c,xk⟩−c=O(1). Now ⟨c,xk⟩−c=⟨sk,xk−x⟩=⟨sIk,xIk⟩+⟨sJk,xJk−xJ⟩≤⟨sIk,xIk⟩+∥sJk∥2∥xJk−xJ∥2. Then using again Proposition 6 we get i∈I∑xikrt~k≤β+β⟨xIk,sIk⟩∥sJk∥2∥xJk−xJ∥2=β+o(∥xIk∥1−r). But β∈(0,1). The result then follows. ∎
Now as is mentioned in Theorem 1 of [1], given t in (−∞,0)∪(0,1), we say the ξt-dual-analytic center the unique optimal solution to the problem
[TABLE]
where
[TABLE]
if t∈(0,1) and
[TABLE]
if t∈(−∞,0).
The unicity of the solution is ensured by the strict quasi-concavity of ξt and Lemma 1 of [1]. The KKT optimality conditions are then expressed as follow. There exist (y,s)∈Rm×Rn and v∈Rn satisfying the following conditions, ∇ξt,I(sI)=vI, Av=0, Aty+s=c, sJ=0, s≥0.
According to Proposition 8, k≥0∑(F(xk+1)−F(xk)) converges and according to Proposition 7, k≥0∑δk and k≥0∑γk converge too. Then using Proposition 9, k≥0∑∥Xk,I2rwIk∥22 converges. Hence
For all k∈N we set I(k)={i:xik≥∥xIk∥∞2}. We shall prove that for some K chosen large enough, I(k)⊂I(k+1), ∀k≥K. For k∈N and i∈{1,⋯,n} we set ϵik=⟨c,xk⟩−cj∈I∑xjkrxik1−rsik−1. Let ϵ>0 be small enough. Then by (1) there exists K∈N large enough such that ∀k≥K, ∣ϵik∣≤ϵ, ∀i∈I(k). Let k≥K and i∈I(k). Since K is assumed to be large we have necessarily from (1) sik>0. Using in addition the fact that ∥sk∥∞≤g and Proposition 4 we get
gnI∥xIk∥∞rxik1−rsik≥j∈I∑xjkrxik1−rsik=(⟨c,xk⟩−c)(1+ϵik)≥L(A,c)∥xk−x∥2(1+ϵik)≥L(A,c)∥xIk∥∞(1−ϵ).
Hence ∥xIk∥∞≥xik≥(nIgL(A,c)(1−ϵ))1−r1∥xIk∥∞
and then xik=O(∥xIk∥∞). Since in addition xik+1=1−βj∈Imax(Xk1−rsk)jxik1−rsikxik and 0<j∈Imax(Xk1−rsk)jxik1−rsik≤1 we have (1−β)xik≤xik+1≤(1+β)xik and then xik+1=O(xik). Now ∥xIk+1∥2=xIk−βj∈Imax(Xk1−rsk)jXk,I2−rsIk2≤∥xIk∥2+∥xIk∥2j∈Imax(Xk1−rsk)j∥Xk,I1−rsIk∥2 and j∈I∑xjkr∥Xk1−rsk∥∞≥j∈I∑xjkrj∈Imaxxjk1−rsjk≥j∈I∑xjkrxjk1−rsjk=⟨sI,xI⟩. It follows that ∥Xk1−rsk∥∞≥j∈Imaxxjk1−rsjk≥j∈I∑xjkr⟨sI,xI⟩.
But (Proposition 6) ∥Xk1−rsk∥∞=O(∥xIk∥∞1−r), ⟨sIk,xIk⟩=O(∥xIk∥∞) and j∈I∑xjkr=O(∥xIk∥∞r). Then j∈Imaxxjk1−rsjk=O(∥xIk∥∞1−r) and then j∈Imax(Xk1−rsk)j∥Xk,I1−rsIk∥2=O(1). Hence there is ϱ>0 such that ∥xIk+1∥2≤ϱ∥xIk∥2. So xik+1=O(xik)=O(∥xIk∥2)≥O(∥xIk+1∥2)≥∥xIk+1∥22 and then i∈I(k+1).
Set now I^=∪k∈NI(k) and let us prove that in fact I^=I. Assume for contradiction that there is i∈I−I^. Then ∀k≥K, xikxik+1=1−β∥XI,k1−rsIk∥∞xik1−rsik=1−β∥xIk∥∞2(1−r)xik1−r∥XI,k1−rsIk∥∞∥xIk∥∞1−rsik∥xIk∥∞1−r≥1−β∥XI,k1−rsIk∥∞∥xIk∥∞1−rsik∥xIk∥∞1−r. We know by Proposition 6 that ∥XI,k1−rsIk∥∞=O(∥xIk∥∞1−r). Using in addition the fact that sk is bounded it follows that xikxik+1≥1+O(∥xIk∥∞1−r). Hence for all K′≥K, xiKxiK′≥K≤k≤K′∏(1+O(∥xIk∥∞1−r))=1+Ok=K∑K′∥xIk∥∞1−r. We know by Proposition 6 that k∈N∑∥xIk∥∞1−r is a converging serie. It follows that chosing K large enough, O(k=K∑+∞∥xIk∥∞1−r)≥−21. Then
0=K′→+∞limxiKxiK′≥21, which is absurde. Hence
and then there is necessarily τ>0 such that τeI<sIk for k being large enough. Let now (y~,s~) be an accumulation point to (yk,sk). Then we have Xs~=0,Ax=b,Ay~+s~=c,x≥0 and s~≥0. The KKT optimality conditions of (LP) are then satisfied and then x is an (LP) optimal solution and (y~,s~) is a dual optimal solution. Moreover since xI=0,s~I>0,xJ>0 and s~J=0 the strict complementary slackness condition holds.
Let now k be large enough. Then it is easy to see with the help of Proposition 6 that (⟨c,xk⟩−c)i∈I∑xikrXk,Ir−1=O(1). It follows that
sIk=(⟨c,xk⟩−c)i∈I∑xikrXk,Ir−1(eI+ϵk)=(⟨c,xk⟩−c)i∈I∑xikrXk,Ir−1eI+O^(ϵ)=ξr,I(xIk)(⟨c,xk⟩−c)∇ξr,I(xIk)+O^(ϵ),
where O^(ϵ) represents every function of ϵ satisfying ϵ↓0limO^(ϵ)=0. According to Theorem 3.1 of [2], we have ξt,I(∇ξr,I(xIk))=1 and then, by continuity of ξt,I on (0,+∞)nI, we get ξt,I(sIk)=ξr,I(xIk)(⟨c,xk⟩−c)+O^(ϵ)=O(1). Hence ξt,I(sIk)sIk=∇ξr,I(xIk)+O^(ϵ). Now it is easy to see from Theorem 3.1 of [2] that ∇ξr,I(⋅) is positively homogeneous of degree 0. It follows that ξt,I(sIk)sIk=∇ξr,I(∥xIk∥∞xIk)+O^(ϵ). By (3) there is τ′>0 such that τ′eI≤∥xIk∥∞xIk≤eI for k being large enough. Then using in addition Lemma 2, zk=∥xIk∥∞xk−x is bounded. Let then z be a limit of a convergent subsequence of (zk). Then
ξt,I(sI)sI=∇ξr,I(zI). Using again Theorem 3.1 of [2] one has
ξr,I(zI)zI=∇ξt,I(sI). But Azk=0. It follows that A(ξr,I(zI)z)=0. Hence (y,s,ξr,I(zI)z) satisfies the KKT optimality conditions for the problem
[TABLE]
The result then follows ∎
Turn back now to the case where I=∅. Then using (1) of Section 2 and adapting results of this section, the expected dual approximate optimal solution vectors y,s and w, associated to a current point x are
[TABLE]
[TABLE]
[TABLE]
where U=diag(u). Hence the expected relative duality gap is
[TABLE]
4 Numerical results
The porpose of the following tests is to compare the algorithm performance according to different values of r between 0 and 1. We have opted to consider the following values r=0 (the classical case), r=0.1, r=0.2, r=0.3, r=0.4, r=0.5, r=0.6 and r=0.7. We solved a large set of testing problems, taken from the familiar Netlib test set (GAY [10]). For values of r exceeding 0.8 the algorithm showed lesser efficiency on most problems. Results obtained are shown in Table 1. Each row in the table contains the name of the problem and the number of iterations for the different values of r. The parameter of the stopping rule is ϵ=10−10. To read the mps-files, the specs-files and perform the symbolic Cholesky factorization we use rdmps1, rdmps2, rdspec, prepro and all dependencies written by Gondzio[12]. We use no presolving. Our numerical experiments were performed on a laptop hp ZBook (Processor: Intel core i7-4810 MQ, CPU 2.804×8HZ - Operating system: Ubuntu linux). The code is written in GNU Fortran 95.
Table 1
⋆⋆ : Number of iterations exceeds 300.
∗: Best optimal value obtained with Rgap∈(10−8,10−10).
At first we can observe that for every problem there is at least one r value for which the problem is solved. Also, we can see that most problems are solved for r between 0 and 0.5. The maximum number of problems solved is reached for r=0.2, as shown in the graph below.
\psaxes[Ox=0,Oy=87,Dx=0.1,dx=1,Dy=1]-¿(0,87)(6,96)
Percentage of solved problemsValues of r
5 Concluding remarks
The results are conclusive and show that differentially barriers penalty functions offer effective alternative to conventional logarithmic barrier function in linear programming.
As we point out in the introduction, we chose an algorithm of affine scaling type for the simplicity of its implementation. But the ”Predictor-corrector” method of Mehrotra [20] has proved highly efficient in the classical case (r=0). Our immediate goal is to adapt this method to these new penalty functions.
Bibliography21
The reference list from the paper itself. Each links out to its DOI / PubMed record.
1[1] A. Barbara, Differential barrier property and strict quasi concavity in linear programming via concave gauges , Optimization, Taylor & \& Francis, Vol. 64, Issue 12, 2015.
2[2] A. Barbara and J.P. Crouzeix, Concave gauge functions and applications . Zeitschrift für Operation Research in Vol. 40, Issue 1, 1994.
3[3] R. Courant, Varialtional methods for the solution of problems of equilibrium and vibrations , Bull. Amer. Math. Soc., 49, 1943, p. 1-23.
4[4] K. R. Frisch, The logarithmic potential method of convex programming. Technical report, University Institute of Economics, Oslo, Norway, 1955.
5[5] A. Auslender, Optimisation, Méthodes Numériques , MASSON, 1976.
6[6] I. I. Dikin, Iterative solution of problems of linear and quadratic programming, Sov. Math. Doklady 8 674-675, 1967.
7[7] E.R. Barnes, A variation on Karmarkar’s algorithm for solving linear programming problems, Mathematical Programming 36, 174-182, 1986.
8[8] R.J. Vanderbei, M. S. Meketon and B. A. Freedman, A modification of Karmarkar’s linear programming algorithm, Algorithmica 1, 395-407, 1986.