This paper introduces a self-concordant perceptron algorithm that efficiently solves a specific sub-family of strict linear feasibility problems using an interior point approach, offering precise complexity analysis.
Contribution
It presents a novel perceptron-based method leveraging interior point techniques for strict linear feasibility, with detailed complexity characterization on certain problem sub-families.
Findings
01
Algorithm matches state-of-the-art linear programming complexity
02
Binary complexity is low on a specific sub-family of instances
03
Provides a more precise complexity analysis for the problem
Abstract
Strict linear feasibility or linear separation is usually tackled using efficient approximation/stochastic algorithms (that may even run in sub-linear times in expectation). However, today state of the art for solving exactly/deterministically such instances is to cast them as a linear programming instances. Inversely, this paper introduces a self-concordant perceptron algorithm which tackles directly strict linear feasibility with interior point paradigm. This algorithm has the same worse times complexity than state of the art linear programming algorithms but it complexity can be characterized more precisely eventually proving that it binary complexity is low on a sub-family of linear feasibility.
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Optimization Algorithms Research · Optimization and Variational Analysis · Multi-Criteria Decision Making
Full text
A simple polynomial-time algorithm for linear feasibility.
Adrien CHAN-HON-TONG
Abstract
This technical report offers a simplified version of a Newton-based interior-point algorithm for linear feasibility. Despite complexity is slightly higher than state of the art, the proof is significantly shorter making this polynomial-time algorithm relevant for education purpose.
1 Introduction and motivation
The central-path log-barrier method [5] has been the state-of-the-art of linear programming since 1988. It has only been recently improved with an efficient data-structure [6] (which is a deterministic version of [3]).
Using [6], a linear program related to a matrix A∈RM×N with total binary size L can be solved in less than O(MγL) operations (where γ is the exponent of matrix multiplication/inversion). This is the best known complexity assuming that the matrix is not too flat i.e. N=O(M) and M=O(N).
This technical report describes a variant of [5] with complexity O(Mγ+1L) i.e. M times slower than the current state of the art.
This algorithm may have other interests, but in this technical report, it is stress that one could prove relatively easily the polynomial time complexity of this algorithm making it relevant for educative purpose.
2 Theorem
Let first introduce some definitions:
•
∀A∈RM×N, let write Γ(A)=m∈{1,...,M}maxAmAmT
•
Let define Ω={A∈RM×N,∃θ∈RN,Aθ>0}
•
∀A∈Ω let write χ(A)=x∈RN,Ax≥1minxTx
This technical report will prove the following results:
∀A∈Ω,∀ω∈]0,∞[M, ωTAATω≥χ(A)Tχ(A)ωTω, in particular 1TAAT1>0.
2. 2.
Assuming A∈Ω, the function FA(v) from ]0,∞[M to R defined by
[TABLE]
verifies FA(1TAAT111)≤1+2Mlog(1TAAT1)≤1+2Mlog(Γ(A))+Mlog(M), and, independently, has a minimum FA∗ and
[TABLE]
3. 3.
Minimizing FA (when A∈Ω) allows to solve linear feasibility i.e. finding θ such that Aθ>0 because ∀v∈]0,∞[M
[TABLE]
4. 4.
Damped Newton descent starting from 1TAAT111 will eventually find v such that FA(v)−FA∗≤8MΓ(A)×χ(A)Tχ(A)+11 after at most O(Mlog(Γ(A))+Mlog(χ(A)Tχ(A)))=O(ML) steps.
3 Proof
3.1 Lower bounding the quadratic part
By definition/notation, ∀A∈Ω, χ(A)=x∈RN,Ax≥1minxTx, one can then consider, ∀ω∈]0,∞[M, the product χ(A)T(ATω).
On one hand, from Cauchy inequality, χ(A)T(ATω)≤χ(A)Tχ(A)×ωTAATω.
But, on the other hand, χ(A)T(ATω)=(Aχ(A))Tω≥1Tω.
So ωTω≤(1Tω)2≤((Aχ(A))Tω)2=(χ(A)T(ATω))2≤χ(A)Tχ(A)×ωTAATω (proof of first part of lemma 1).
As corollary 1TAAT11 exists (as 1TAAT1≥χ(A)Tχ(A)M2>0). Thus, a simple calculation from definition of FA leads to that FA(1TAAT111)=21+2Mlog(1TAAT1).
Now, 1TAAT1=i,j∑AiTAj≤M2Γ(A) from Cauchy inequality (proof of first part of lemma 2).
3.2 Lower bounding the function
Using lemma 1, one can write
FA(v)≥2χ(A)Tχ(A)vTv−m∈{1,...,M}∑log(vm) or equivalently,
[TABLE]
Let μ be the function such that ∀t>0,μ(t)=2χ(A)Tχ(A)t2−log(t), then, μ is trivially lower bounded as μ(t)t→0or∞→∞ so μ has a minimum which is reached when χ(A)Tχ(A)t−t1=μ′(t)=0.
So, μ∗=μ(χ(A)Tχ(A))=21−21log(χ(A)Tχ(A)).
As FA(v)≥m∈{1,...,M}∑μ(vm), it comes that FA is also lower bounded, and, goes also to ∞ if a single vm goes to 0 or ∞. Thus, FA has a minimum.
And, FA∗≥Mμ∗≥−2Mlog(χ(A)Tχ(A)) (proof of second part of lemma 2).
3.3 Link with linear feasibility
3.3.1 sub-lemma
Let first prove a sub-lemma: if vTAATv≥4M, then FA(2v)≤F(v)−2M.
Indeed, FA(2v)=412vTAATv−m∈{1,...,M}∑log(vm)+Mlog(2) but log(2)≤1 and vTAATv≥4M. So, FA(2v)≤FA(v)−3M+M.
Thus, if F(v)−F∗≤2M, then, vTAATv≤4M, and using again the inequality from lemma 1, it comes vTv≤8Mχ(A)Tχ(A). In particular, F(v)−F∗≤2M⇒∀m∈{1,...,M},vm2≤8Mχ(A)Tχ(A).
3.3.2 The cost of negativity
Let now prove lemma 3 by showing that one can get a 8MΓ(A)×χ(A)Tχ(A)+11 improvement as soon as ∃k∈{1,...,M},AkATv≤0.
Thus, the second assertion can not be true if such decay is impossible because FA(v) is too close to the optimum (ensuring a fortiori F(v)−F∗≤2M).
So, let assume ∃k∈{1,...,M},AkATv≤0 and let introduce w=v+t1k i.e. wm=vm if m=k and wk=vk+t.
FA(w)=21(v+t1k)TAAT(v+t1k)−m∑log(vm)+log(vk)−log(vk+t)=FA(v)+tAkATv+21t2AkAkT−log(vk+t)+log(vk).
But, AkATv≤0, so FA(wk)≤FA(v)+21t2AkAkT−log(vk+t)+log(vk), and, it is clear that for 0≤t≪1, FA(wk)<FA(v) (because this is −log(vk+t) at first order).
Precisely, one could define Φ(t)=FA(v)+21t2AkAkT−log(vk+t)+log(vk). Then, Φ′(t)=AkAkTt−t+vk1 and
Φ′′(t)=AkAkT+(t+vk)21 and Φ′′′(t)=−(t+vk)32.
As, Φ′′′(t)≤0 and t≥0, Φ(t)≤Φ(0)+tΦ′(0)+2t2Φ′′(0) i.e.
[TABLE]
In particular, for t=vk2×AkAkT+1vk, FA(w)≤FA(v)−21vk2×AkAkT+11.
But, this is not possible if FA(v) is closer than FA∗ by this value (using definition of Γ(A) and the sub-lemma to upper bound AkAkT and vk2).
3.4 Effect of Newton Descent
The underlying theory of lemma 4 is that FA is self concordant, a property that allows to prove that Newton descent starting from v0 eventually approximate FA with precision ε after at most O(FA(v0)−FA∗+loglog(ε1)) steps.
However, this proof which can be found in [4, 2] is relatively long, but can be shortened here as one only needs to proof that reaching precision ε takes at most O(FA(v0)−FA∗+log(ε1)) steps.
Indeed, the required ε is only 8MΓ(A)×χ(A)Tχ(A)+11 whose log if basically O(log(Γ(A))+log(χ(A)Tχ(A))) which is negligible regarding FA(v0)−FA∗ which is 1+2Mlog(Γ(A))+Mlog(M)+2Mlog(χ(A)Tχ(A)).
3.4.1 self concordance
∀v∈]0,∞[M,t∈]0,∞[, and, w∈RM, there exists av,w<0<bv,w, such that fv,w(t)=F(v+tw)=2(v+tw)TAAT(v+tw)−m∈{1,...,M}∑log(vm+twm) is well define on ]av,w,bv,w[.
Now fv,w(t)=2vTAATv+t×(vTAAw)+t22wTAATw−m∈{1,...,M}∑log(1+tvmwm)+log(vm)=FA(v)+t×(vTAAw)+t22wTAATw−m∈{1,...,M}∑log(1+tvmwm).
Then, fv,w′(t)=vTAAw+t×(wTAATw)−m∈{1,...,M}∑1+cmtcm by writing cm=vmwm.
And, fv,w′′(t)=wTAATw+m∈{1,...M}∑(1+cmt)2cm2
In particular, fv,w′′(0)=wTAATw+m∈{1,...M}∑cm2 and thus, ∀m∈{1,...,M}, ∣cm∣≤fv,w′′(0).
Thus, one can observe that fv,w′′(t)=wTAATw+m∈{1,...M}∑(1+cmt)2cm2≥wTAATw+m∈{1,...M}∑(1+fv,w′′(0)t)2cm2≥(1+fv,w′′(0)t)2wTAATw+m∈{1,...M}∑cm2=(1+fv,w′′(0)t)2fv,w′′(0).
And in the other hand, fv,w′′(t)=wTAATw+m∈{1,...M}∑(1+cmt)2cm2≤wTAATw+m∈{1,...M}∑(1−fv,w′′(0)t)2cm2≤(1−fv,w′′(0)t)2wTAATw+m∈{1,...M}∑cm2=(1−fv,w′′(0)t)2fv,w′′(0).
So,
[TABLE]
(This is common to all self concordant functions but here proven directly.)
3.4.2 Newton decrement
Independently from previous 3.4.1, a required lemma is to prove that for any function G from a subset of RM to R twice derivable with a positive hessian in a point ζ, then for any non-null vector ω∈RM, the following inequality holds ωT(∇ζ2G)ωωT(∇ζG)≤(∇ζG)T(∇ζ2G)−1(∇ζG).
Indeed, let Ψ(ω,t)=−t×ωT(∇ζG)+2t2×ωT(∇ζ2G)ω.
The minimum of this function regarding t is for t=ωT(∇ζ2G)ωωT(∇ζG) resulting in −2ωT(∇ζ2G)ω(ωT(∇ζG))2.
But, the global minimum regarding both t and ω (which is thus even lower) is for t=1 and ω=(∇ζ2G)−1(∇ζG) with resulting value −21(∇ζG)T(∇ζ2G)−1(∇ζG)Notation: From now, (∇vFA)T(∇v2FA)−1(∇vFA) will be written λ(v) (standard notation for the Newton decrement).
Injecting the bound on λ in the case of fv,w(t) says that ∀v∈]0,∞[M and, w∈RM, −fv,w′′(0)fv,w′(0)≤λ(v)=(∇vFA)T(∇v2FA)−1(∇vFA) because fv,w′(0)=wT(∇vFA) and fv,w′′(0)=wT(∇v2FA)w considering the Taylor expansion of fv,w(t)=FA(v+tw).
Let point out that ∇v2FA=ATA+Diag(v1,...,vM)−2 i.e. a positive + a strict positive, thus, ∇v2FA is never singular with smallest eigen value never lower than (8Mχ(A)Tχ(A))21.
3.4.3 Effect of a Newton step
By integrating the higher bound of 3.4.1 i.e. fv,w′′(t)≤(1−fv,w′′(0)t)2fv,w′′(0), one found that
[TABLE]
As one could be interested to minimize the right term, one will consider w=−(∇v2FA)−1(∇vFA) seeing 3.4.2 leading to
[TABLE]
In particular
[TABLE]
Notation: 1+λ(v)1×(∇v2FA)−1(∇vFA) will be now written N(v).
So, ∀v∈]0,∞[M, it is possible with 1 Newton step to decrease FA(v) by −λ(v)+log(1+λ(v)).
In particular, ∀v∈]0,∞[M, if λ(v)≥O(1), then it is possible to get a decrease of at least O(1).
3.4.4 Optimality gap
As seen in 3.4.3, if λ(v)≥O(1), then, it is possible to get a decrease of O(1).
Thus, the Newton method starting from v0=1TAAT111 will eventually find v such that λ(v)≤O(1)<1 after at most O(FA(v0)−FA∗)=O(Mlog(Γ(A))+Mlog(χ(A)Tχ(A))) steps.
(Otherwise, one would construct a point ρ such that FA(ρ)<FA∗ which is a contradiction.)
Now, by integrating the lower bound of 3.4.1 i.e. fv,w′′(t)≥(1+fv,w′′(0)t)2fv,w′′(0), one founds that
[TABLE]
This bound is not useful if fv,w′(0)+fv,w′′(0)<0 (because, it just tells that fv,w(t) is higher than something which goes to −∞).
But, if fv,w′(0)+fv,w′′(0)>0 with fv,w′(0)<0, then, the right term has a none trivial minimum at t∗=fv,w′′(0)+fv,w′′(0)fv,w′(0)−fv,w′(0) leading to
[TABLE]
This condition fv,w′(0)+fv,w′′(0)>0 corresponds to λ(v)<1.
Now, the function ϕ(u)=u−log(1+u) verifies ϕ′(u)=1−1+u1≥0 for u≥0. So, the right term is minimized when fv,w′′(0)fv,w′(0) is increased.
In particular for λ(v) seeing 3.4.2,
[TABLE]
As this is true for all w and t, this is in particular true for w,t leading to the optimum of FA:
[TABLE]
3.4.5 Convergence
Let consider α(u)=−u+log(1+u)+8u2.
α′(u)=−1+u+11+4u and α′′(u)=−(u+1)21+41.
α′′(u)<0 for u∈[0,1] so α′(u) is decreasing for u∈[0,1]. Yet, α′(0)=0. So α′(u)<0 for u∈[0,1].
So α(u) is decreasing, yet, α(0)=0. So α(u)<0 for u∈[0,1].
So ∀u∈[0,1],−u+log(1+u)+8u2≤0 i.e. −u+log(1+u)≤−8u2≤0
So, seeing 3.4.3, ∀v∈]0,∞[M,
[TABLE]
On the other hand, let consider β(u)=u+log(1−u)+2u2.
β′(u)=1−1−u1+4u, β′′(u)=4−(1−u)21.
So, ∀u∈[0,21],β′′(u)≥0, so β′ is increasing for u∈[0,21].
But, β′(0)=0, so β′(u)≥0 for u∈[0,21], so β is increasing for u∈[0,21].
But, β(0)=0. So, ∀u∈[0,21],β(u)=u+log(1−u)+2u2≥0.
So, ∀u∈[0,21],u+log(1−u)≥−2u2.
So the inequality of 3.4.4 becomes
[TABLE]
In this case, it means
[TABLE]
So, for λ(v)≤21, on one hand, a Newton step decreases F(v)−FA∗ by 8λ(v)2, but, on the other hand, 8λ(v)2≥16F(v)−FA∗.
It means that, for λ(v)≤21, F(v)−FA∗ is decreased by 16F(v)−FA∗.
So, when performing one Newton step v=v−N(v):
•
either, λ(v)>21, and F(v−N(v))−FA∗≤F(v)−FA∗−21+log(23)
Thus, the maximal number of step required to reach F(v)−FA∗≤ε from F(v0)−FA∗ is
21−log(23)F(v0)−FA∗+log(1516)log(ε).
By combining this result with other lemmas, this almost proves lemma 4.
3.4.6 Corollary
As χ(A) can be linked to a linear system involving only A and 1 and 0 coefficients. Cramer rules allows to write χ(A)Tχ(A) with sub-determinant of A.
So, if A requires L bit to be written in binary, then, log(χ(A)Tχ(A))=O(L).
This is also trivially the case for Γ(A).
Thus, the complexity of the Newton descent to find a solution to the linear feasibility problem with A∈Ω with L binary size is O(ML) Newton steps whose cost is O(Mγ) resulting in a O(Mγ+1L) complexity M times higher than the current state of the art but still polynomial (and still better than ellipsoid or Karmarkar method).
Other potential interests
Currently, there is multiple variations of [5] (with complexity O(Mγ+1L)) to solve linear feasibility query:
•
Minimizing FA(v)=2vTAATv−m∈{1,...,M}∑log(vm) leads to A(ATv)>0 as proven in this technical report.
•
But, minimizing GA(x)=m∈{1,...,M}∑δAmx−log(Amx+1) also solves this problem. The proof is currently even shorter but with the drawback of requiring the value of δ. The key ideas of the proof are that one can consider κ(A)=x,Ax≥1min1TAx. When, δ≤21TAκ(A)log(2) (i.e. δ=O(2−L)), then, adding κ(A) to the optimal solution of GA increase the linear part by less than 2log(2). But it increases all Amx by 1. In particular, if some Akx<0, it means that Ak(x+κ(A))+1>2×(Akx+1) decreasing the overall function value by at least log(2). So while ¬Ax>0, GA(x+κ(x))≤GA(x)−2log(2). Thus, ∀x,GA(x)−GA∗≤2log(2)⇒Ax>0. (Interestingly, minimization of GA never encountered λG smaller than 2log(2).)
•
And finally, it is also possible to consider the minimization of JA(x,t)=Ξ×χ(A)Tχ(A)×t+t2+xTx−m∈{1,...,M}∑log(Amx+t). Both the algorithm and the proof are less straightforward. Coarsely if t≥0, JA can not be lower than −Mlog(σ(A)) where σ(A) is related to the highest eigen value of A. But, JA(χ(A)Tχ(A)χ(A),−2χ(A)Tχ(A)1)≤−Ξ+Mlog(χ(A)Tχ(A)).
Thus, for Ξ≥Mlog(χ(A)Tχ(A))+Mlog(σ), the optimal solution of JA has t≤0 i.e. the x part is a solution to Ax>0.
Importantly, minimizing FA(v) has also the advantage that the effect of a ceiling of v is easily computed allowing a simple implementation of the minimization of FA with frozen denominator (which can be estimated using Γ(A) only - at least as long λ(v)≥21).
Finally, an other interesting point is that, if minimizing FA, GA or JA allows to solve the same problem with same complexity, the 3 dynamics during the minimization processes may not behave the same. In particular, FA does not seems to be just the dual of GA or the same with just another regularization. Preliminary numerical experiments seems to indicate that FA and GA seems to have critical different dynamics (see https://hal.science/hal-02399129v18).
This may be a potential way to bypass recent negative result on linear programming solver [1].
Bibliography6
The reference list from the paper itself. Each links out to its DOI / PubMed record.
1[1] Xavier Allamigeon, Stéphane Gaubert, and Nicolas Vandame. No self-concordant barrier interior point method is strongly polynomial. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing , pages 515–528, 2022.
2[2] Stephen P Boyd and Lieven Vandenberghe. Convex optimization . Cambridge university press, 2004.
3[3] Michael B Cohen, Yin Tat Lee, and Zhao Song. Solving linear programs in the current matrix multiplication time. Journal of the ACM (JACM) , 68(1):1–39, 2021.
4[4] Arkadi Nemirovski. Interior point polynomial time methods in convex programming. Lecture notes , 42(16):3215–3224, 2004.
5[5] James Renegar. A polynomial-time algorithm, based on newton’s method, for linear programming. Mathematical programming , 40(1):59–93, 1988.
6[6] Jan van den Brand. A deterministic linear program solver in current matrix multiplication time. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms , pages 259–278. SIAM, 2020.