The self-concordant perceptron is efficient on a sub-family feasibility instances

Adrien Chan-Hon-Tong

arXiv:1901.08525·cs.DS·February 17, 2026

The self-concordant perceptron is efficient on a sub-family feasibility instances

Adrien Chan-Hon-Tong

PDF

Open Access

TL;DR

This paper introduces a self-concordant perceptron algorithm that efficiently solves a specific sub-family of strict linear feasibility problems using an interior point approach, offering precise complexity analysis.

Contribution

It presents a novel perceptron-based method leveraging interior point techniques for strict linear feasibility, with detailed complexity characterization on certain problem sub-families.

Findings

01

Algorithm matches state-of-the-art linear programming complexity

02

Binary complexity is low on a specific sub-family of instances

03

Provides a more precise complexity analysis for the problem

Abstract

Strict linear feasibility or linear separation is usually tackled using efficient approximation/stochastic algorithms (that may even run in sub-linear times in expectation). However, today state of the art for solving exactly/deterministically such instances is to cast them as a linear programming instances. Inversely, this paper introduces a self-concordant perceptron algorithm which tackles directly strict linear feasibility with interior point paradigm. This algorithm has the same worse times complexity than state of the art linear programming algorithms but it complexity can be characterized more precisely eventually proving that it binary complexity is low on a sub-family of linear feasibility.

Equations32

F_{A} (v) = \frac{v ^{T} A A ^{T} v}{2} - m \in {1, ..., M} \sum lo g (v_{m})

F_{A} (v) = \frac{v ^{T} A A ^{T} v}{2} - m \in {1, ..., M} \sum lo g (v_{m})

- F_{A}^{*} \leq \frac{M}{2} lo g (χ (A)^{T} χ (A))

- F_{A}^{*} \leq \frac{M}{2} lo g (χ (A)^{T} χ (A))

F_{A} (v) - F_{A}^{*} \leq \frac{1}{8 M Γ ( A ) \times χ ( A ) ^{T} χ ( A ) + 1} \Rightarrow A (A^{T} v) > 0

F_{A} (v) - F_{A}^{*} \leq \frac{1}{8 M Γ ( A ) \times χ ( A ) ^{T} χ ( A ) + 1} \Rightarrow A (A^{T} v) > 0

F_{A} (v) \geq m \in {1, ..., M} \sum \frac{v _{m}^{2}}{2 χ ( A ) ^{T} χ ( A )} - lo g (v_{m})

F_{A} (v) \geq m \in {1, ..., M} \sum \frac{v _{m}^{2}}{2 χ ( A ) ^{T} χ ( A )} - lo g (v_{m})

Φ (t) \leq - \frac{t}{v _{k}} + \frac{t ^{2}}{2} (A_{k} A_{k}^{T} + \frac{1}{v _{k}^{2}})

Φ (t) \leq - \frac{t}{v _{k}} + \frac{t ^{2}}{2} (A_{k} A_{k}^{T} + \frac{1}{v _{k}^{2}})

\frac{f _{v, w}^{''} ( 0 )}{( 1 + f _{v, w}^{''} ( 0 ) t ) ^{2}} \leq f_{v, w}^{''} (t) \leq \frac{f _{v, w}^{''} ( 0 )}{( 1 - f _{v, w}^{''} ( 0 ) t ) ^{2}}

\frac{f _{v, w}^{''} ( 0 )}{( 1 + f _{v, w}^{''} ( 0 ) t ) ^{2}} \leq f_{v, w}^{''} (t) \leq \frac{f _{v, w}^{''} ( 0 )}{( 1 - f _{v, w}^{''} ( 0 ) t ) ^{2}}

f_{v, w} (t) \leq f_{v, w} (0) + t f_{v, w}^{'} (0) - t f_{v, w}^{''} (0) - lo g (1 - t f_{v, w}^{''} (0))

f_{v, w} (t) \leq f_{v, w} (0) + t f_{v, w}^{'} (0) - t f_{v, w}^{''} (0) - lo g (1 - t f_{v, w}^{''} (0))

F_{A} (v - t \times (\nabla_{v}^{2} F_{A})^{- 1} (\nabla_{v} F_{A})) \leq F_{A} (v) - t λ (v)^{2} - λ (v) t - lo g (1 - λ (v) t)

F_{A} (v - t \times (\nabla_{v}^{2} F_{A})^{- 1} (\nabla_{v} F_{A})) \leq F_{A} (v) - t λ (v)^{2} - λ (v) t - lo g (1 - λ (v) t)

F_{A} (v - \frac{1}{1 + λ ( v )} \times (\nabla_{v}^{2} F_{A})^{- 1} (\nabla_{v} F_{A})) \leq F_{A} (v) - λ (v) + lo g (1 + λ (v))

F_{A} (v - \frac{1}{1 + λ ( v )} \times (\nabla_{v}^{2} F_{A})^{- 1} (\nabla_{v} F_{A})) \leq F_{A} (v) - λ (v) + lo g (1 + λ (v))

f_{v, w} (t) \geq f_{v, w} (0) + t f_{v, w}^{'} (0) + t f_{v, w}^{''} (0) - lo g (1 + t f_{v, w}^{''} (0))

f_{v, w} (t) \geq f_{v, w} (0) + t f_{v, w}^{'} (0) + t f_{v, w}^{''} (0) - lo g (1 + t f_{v, w}^{''} (0))

f_{v, w} (t) \geq f_{v, w} (0) - \frac{f _{v, w}^{'} ( 0 )}{f _{v, w}^{''} ( 0 )} + lo g 1 + \frac{f _{v, w}^{'} ( 0 )}{f _{v, w}^{''} ( 0 )}

f_{v, w} (t) \geq f_{v, w} (0) - \frac{f _{v, w}^{'} ( 0 )}{f _{v, w}^{''} ( 0 )} + lo g 1 + \frac{f _{v, w}^{'} ( 0 )}{f _{v, w}^{''} ( 0 )}

f_{v, w} (t) \geq f_{v, w} (0) + λ (v) + lo g (1 - λ (v))

f_{v, w} (t) \geq f_{v, w} (0) + λ (v) + lo g (1 - λ (v))

\forall v \in] 0, \infty [^{M}, λ (v) \leq 1 \Rightarrow F_{A}^{*} \geq F (v) + λ (v) + lo g (1 - λ (v))

\forall v \in] 0, \infty [^{M}, λ (v) \leq 1 \Rightarrow F_{A}^{*} \geq F (v) + λ (v) + lo g (1 - λ (v))

λ (v) \leq 1 \Rightarrow F_{A} (v - N (v)) \leq F_{A} (v) - λ (v) + lo g (1 + λ (v)) \leq F_{A} (v) - \frac{λ ( v ) ^{2}}{8}

λ (v) \leq 1 \Rightarrow F_{A} (v - N (v)) \leq F_{A} (v) - λ (v) + lo g (1 + λ (v)) \leq F_{A} (v) - \frac{λ ( v ) ^{2}}{8}

\forall v \in] 0, \infty [^{M}, λ (v) \leq \frac{1}{2} \Rightarrow F_{A}^{*} \geq F (v) - 2 λ (v)^{2}

\forall v \in] 0, \infty [^{M}, λ (v) \leq \frac{1}{2} \Rightarrow F_{A}^{*} \geq F (v) - 2 λ (v)^{2}

\forall v \in] 0, \infty [^{M}, λ (v) \leq \frac{1}{2} \Rightarrow λ (v)^{2} \geq \frac{F ( v ) - F _{A}^{*}}{2}

\forall v \in] 0, \infty [^{M}, λ (v) \leq \frac{1}{2} \Rightarrow λ (v)^{2} \geq \frac{F ( v ) - F _{A}^{*}}{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optimization Algorithms Research · Optimization and Variational Analysis · Multi-Criteria Decision Making

Full text

A simple polynomial-time algorithm for linear feasibility.

Adrien CHAN-HON-TONG

Abstract

This technical report offers a simplified version of a Newton-based interior-point algorithm for linear feasibility. Despite complexity is slightly higher than state of the art, the proof is significantly shorter making this polynomial-time algorithm relevant for education purpose.

1 Introduction and motivation

The central-path log-barrier method [5] has been the state-of-the-art of linear programming since 1988. It has only been recently improved with an efficient data-structure [6] (which is a deterministic version of [3]). Using [6], a linear program related to a matrix $A\in\mathbb{R}^{M\times N}$ with total binary size $L$ can be solved in less than $O(M^{\gamma}L)$ operations (where $\gamma$ is the exponent of matrix multiplication/inversion). This is the best known complexity assuming that the matrix is not too flat i.e. $N=O(M)$ and $M=O(N)$ .

This technical report describes a variant of [5] with complexity $O(M^{\gamma+1}L)$ i.e. $M$ times slower than the current state of the art. This algorithm may have other interests, but in this technical report, it is stress that one could prove relatively easily the polynomial time complexity of this algorithm making it relevant for educative purpose.

2 Theorem

Let first introduce some definitions:

•

$\forall A\in\mathbb{R}^{M\times N}$ , let write $\Gamma(A)=\underset{m\in\{1,...,M\}}{\max}A_{m}A_{m}^{T}$

•

Let define $\Omega=\{A\in\mathbb{R}^{M\times N},\ \exists\theta\in\mathbb{R}^{N},\ A\theta>\mathbf{0}\}$

•

$\forall A\in\Omega$ let write $\chi(A)=\underset{x\in\mathbb{R}^{N},Ax\geq\mathbf{1}}{\min}\ x^{T}x$

This technical report will prove the following results:

$\forall A\in\Omega,\forall\omega\in]0,\infty[^{M}$ , $\omega^{T}AA^{T}\omega\geq\frac{\omega^{T}\omega}{\chi(A)^{T}\chi(A)}$ , in particular $\mathbf{1}^{T}AA^{T}\mathbf{1}>0$ . 2. 2.

Assuming $A\in\Omega$ , the function $F_{A}(v)$ from $]0,\infty[^{M}$ to $\mathbb{R}$ defined by

[TABLE]

verifies $F_{A}\left(\frac{1}{\sqrt{\mathbf{1}^{T}AA^{T}\mathbf{1}}}\mathbf{1}\right)\leq 1+\frac{M}{2}\log(\mathbf{1}^{T}AA^{T}\mathbf{1})\leq 1+\frac{M}{2}\log(\Gamma(A))+M\log(M)$ , and, independently, has a minimum $F_{A}^{*}$ and

[TABLE] 3. 3.

Minimizing $F_{A}$ (when $A\in\Omega$ ) allows to solve linear feasibility i.e. finding $\theta$ such that $A\theta>\mathbf{0}$ because $\forall v\in]0,\infty[^{M}$

[TABLE] 4. 4.

Damped Newton descent starting from $\frac{1}{\sqrt{\mathbf{1}^{T}AA^{T}\mathbf{1}}}\mathbf{1}$ will eventually find $v$ such that $F_{A}(v)-F^{*}_{A}\leq\frac{1}{8M\Gamma(A)\times\chi(A)^{T}\chi(A)+1}$ after at most $O(M\log(\Gamma(A))+M\log(\chi(A)^{T}\chi(A)))=O(ML)$ steps.

3 Proof

3.1 Lower bounding the quadratic part

By definition/notation, $\forall A\in\Omega$ , $\chi(A)=\underset{x\in\mathbb{R}^{N},Ax\geq\mathbf{1}}{\min}\ x^{T}x$ , one can then consider, $\forall\omega\in]0,\infty[^{M}$ , the product $\chi(A)^{T}(A^{T}\omega)$ . On one hand, from Cauchy inequality, $\chi(A)^{T}(A^{T}\omega)\leq\sqrt{\chi(A)^{T}\chi(A)\times\omega^{T}AA^{T}\omega}$ . But, on the other hand, $\chi(A)^{T}(A^{T}\omega)=(A\chi(A))^{T}\omega\geq\mathbf{1}^{T}\omega$ .

So $\omega^{T}\omega\leq(\mathbf{1}^{T}\omega)^{2}\leq((A\chi(A))^{T}\omega)^{2}=(\chi(A)^{T}(A^{T}\omega))^{2}\leq\chi(A)^{T}\chi(A)\times\omega^{T}AA^{T}\omega$ (proof of first part of lemma 1).

As corollary $\frac{1}{\mathbf{1}^{T}AA^{T}\mathbf{1}}$ exists (as $\mathbf{1}^{T}AA^{T}\mathbf{1}\geq\frac{M^{2}}{\chi(A)^{T}\chi(A)}>0$ ). Thus, a simple calculation from definition of $F_{A}$ leads to that $F_{A}\left(\frac{1}{\sqrt{\mathbf{1}^{T}AA^{T}\mathbf{1}}}\mathbf{1}\right)=\frac{1}{2}+\frac{M}{2}\log(\mathbf{1}^{T}AA^{T}\mathbf{1})$ . Now, $\mathbf{1}^{T}AA^{T}\mathbf{1}=\underset{i,j}{\sum}A_{i}^{T}A_{j}\leq M^{2}\Gamma(A)$ from Cauchy inequality (proof of first part of lemma 2).

3.2 Lower bounding the function

Using lemma 1, one can write $F_{A}(v)\geq\frac{v^{T}v}{2\chi(A)^{T}\chi(A)}-\underset{m\in\{1,...,M\}}{\sum}\log(v_{m})$ or equivalently,

[TABLE]

Let $\mu$ be the function such that $\forall t>0,\ \mu(t)=\frac{t^{2}}{2\chi(A)^{T}\chi(A)}-\log(t)$ , then, $\mu$ is trivially lower bounded as $\mu(t)\underset{t\rightarrow 0\ \mathrm{or}\ \infty}{\rightarrow}\infty$ so $\mu$ has a minimum which is reached when $\frac{t}{\chi(A)^{T}\chi(A)}-\frac{1}{t}=\mu^{\prime}(t)=0$ . So, $\mu^{*}=\mu(\sqrt{\chi(A)^{T}\chi(A)})=\frac{1}{2}-\frac{1}{2}\log(\chi(A)^{T}\chi(A))$ .

As $F_{A}(v)\geq\underset{m\in\{1,...,M\}}{\sum}\mu(v_{m})$ , it comes that $F_{A}$ is also lower bounded, and, goes also to $\infty$ if a single $v_{m}$ goes to 0 or $\infty$ . Thus, $F_{A}$ has a minimum. And, $F_{A}^{*}\geq M\mu^{*}\geq-\frac{M}{2}\log(\chi(A)^{T}\chi(A))$ (proof of second part of lemma 2).

3.3 Link with linear feasibility

3.3.1 sub-lemma

Let first prove a sub-lemma: if $v^{T}AA^{T}v\geq 4M$ , then $F_{A}\left(\frac{v}{2}\right)\leq F(v)-2M$ . Indeed, $F_{A}\left(\frac{v}{2}\right)=\frac{1}{4}\frac{v^{T}AA^{T}v}{2}-\underset{m\in\{1,...,M\}}{\sum}\log(v_{m})+M\log(2)$ but $\log(2)\leq 1$ and $v^{T}AA^{T}v\geq 4M$ . So, $F_{A}\left(\frac{v}{2}\right)\leq F_{A}(v)-3M+M$ .

Thus, if $F(v)-F^{*}\leq 2M$ , then, $v^{T}AA^{T}v\leq 4M$ , and using again the inequality from lemma 1, it comes $v^{T}v\leq 8M\chi(A)^{T}\chi(A)$ . In particular, $F(v)-F^{*}\leq 2M\Rightarrow\forall m\in\{1,...,M\},\ v_{m}^{2}\leq 8M\chi(A)^{T}\chi(A)$ .

3.3.2 The cost of negativity

Let now prove lemma 3 by showing that one can get a $\frac{1}{8M\Gamma(A)\times\chi(A)^{T}\chi(A)+1}$ improvement as soon as $\exists k\in\{1,...,M\},\ A_{k}A^{T}v\leq 0$ . Thus, the second assertion can not be true if such decay is impossible because $F_{A}(v)$ is too close to the optimum (ensuring a fortiori $F(v)-F^{*}\leq 2M$ ).

So, let assume $\exists k\in\{1,...,M\},\ A_{k}A^{T}v\leq 0$ and let introduce $w=v+t\mathbf{1}_{k}$ i.e. $w_{m}=v_{m}$ if $m\neq k$ and $w_{k}=v_{k}+t$ .

$F_{A}(w)=\frac{1}{2}(v+t\mathbf{1}_{k})^{T}AA^{T}(v+t\mathbf{1}_{k})-\underset{m}{\sum}\log(v_{m})+\log(v_{k})-\log(v_{k}+t)=F_{A}(v)+tA_{k}A^{T}v+\frac{1}{2}t^{2}A_{k}A_{k}^{T}-\log(v_{k}+t)+\log(v_{k})$ . But, $A_{k}A^{T}v\leq 0$ , so $F_{A}(w_{k})\leq F_{A}(v)+\frac{1}{2}t^{2}A_{k}A_{k}^{T}-\log(v_{k}+t)+\log(v_{k})$ , and, it is clear that for $0\leq t\ll 1$ , $F_{A}(w_{k})<F_{A}(v)$ (because this is $-\log(v_{k}+t)$ at first order).

Precisely, one could define $\Phi(t)=F_{A}(v)+\frac{1}{2}t^{2}A_{k}A_{k}^{T}-\log(v_{k}+t)+\log(v_{k})$ . Then, $\Phi^{\prime}(t)=A_{k}A_{k}^{T}t-\frac{1}{t+v_{k}}$ and $\Phi^{\prime\prime}(t)=A_{k}A_{k}^{T}+\frac{1}{(t+v_{k})^{2}}$ and $\Phi^{\prime\prime\prime}(t)=-\frac{2}{(t+v_{k})^{3}}$ . As, $\Phi^{\prime\prime\prime}(t)\leq 0$ and $t\geq 0$ , $\Phi(t)\leq\Phi(0)+t\Phi^{\prime}(0)+\frac{t^{2}}{2}\Phi^{\prime\prime}(0)$ i.e.

[TABLE]

In particular, for $t=\frac{v_{k}}{v_{k}^{2}\times A_{k}A_{k}^{T}+1}$ , $F_{A}(w)\leq F_{A}(v)-\frac{1}{2}\frac{1}{v_{k}^{2}\times A_{k}A_{k}^{T}+1}$ . But, this is not possible if $F_{A}(v)$ is closer than $F_{A}^{*}$ by this value (using definition of $\Gamma(A)$ and the sub-lemma to upper bound $A_{k}A_{k}^{T}$ and $v_{k}^{2}$ ).

3.4 Effect of Newton Descent

The underlying theory of lemma 4 is that $F_{A}$ is self concordant, a property that allows to prove that Newton descent starting from $v_{0}$ eventually approximate $F_{A}$ with precision $\varepsilon$ after at most $O(F_{A}(v_{0})-F_{A}^{*}+\log\log(\frac{1}{\varepsilon}))$ steps.

However, this proof which can be found in [4, 2] is relatively long, but can be shortened here as one only needs to proof that reaching precision $\varepsilon$ takes at most $O(F_{A}(v_{0})-F_{A}^{*}+\log(\frac{1}{\varepsilon}))$ steps. Indeed, the required $\varepsilon$ is only $\frac{1}{8M\Gamma(A)\times\chi(A)^{T}\chi(A)+1}$ whose $\log$ if basically $O(\log(\Gamma(A))+\log(\chi(A)^{T}\chi(A)))$ which is negligible regarding $F_{A}(v_{0})-F_{A}^{*}$ which is $1+\frac{M}{2}\log(\Gamma(A))+M\log(M)+\frac{M}{2}\log(\chi(A)^{T}\chi(A))$ .

3.4.1 self concordance

$\forall v\in]0,\infty[^{M},t\in]0,\infty[$ , and, $w\in\mathbb{R}^{M}$ , there exists $a_{v,w}<0<b_{v,w}$ , such that $f_{v,w}(t)=F(v+tw)=\frac{(v+tw)^{T}AA^{T}(v+tw)}{2}-\underset{m\in\{1,...,M\}}{\sum}\log(v_{m}+tw_{m})$ is well define on $]a_{v,w},b_{v,w}[$ .

Now $f_{v,w}(t)=\frac{v^{T}AA^{T}v}{2}+t\times(v^{T}AA^{w})+t^{2}\frac{w^{T}AA^{T}w}{2}-\underset{m\in\{1,...,M\}}{\sum}\log(1+t\frac{w_{m}}{v_{m}})+\log(v_{m})=F_{A}(v)+t\times(v^{T}AA^{w})+t^{2}\frac{w^{T}AA^{T}w}{2}-\underset{m\in\{1,...,M\}}{\sum}\log(1+t\frac{w_{m}}{v_{m}})$ .

Then, $f^{\prime}_{v,w}(t)=v^{T}AA^{w}+t\times(w^{T}AA^{T}w)-\underset{m\in\{1,...,M\}}{\sum}\frac{c_{m}}{1+c_{m}t}$ by writing $c_{m}=\frac{w_{m}}{v_{m}}$ .

And, $f^{\prime\prime}_{v,w}(t)=w^{T}AA^{T}w+\underset{m\in\{1,...M\}}{\sum}\frac{c^{2}_{m}}{(1+c_{m}t)^{2}}$

In particular, $f^{\prime\prime}_{v,w}(0)=w^{T}AA^{T}w+\underset{m\in\{1,...M\}}{\sum}c^{2}_{m}$ and thus, $\forall m\in\{1,...,M\}$ , $|c_{m}|\leq\sqrt{f^{\prime\prime}_{v,w}(0)}$ .

Thus, one can observe that $f^{\prime\prime}_{v,w}(t)=w^{T}AA^{T}w+\underset{m\in\{1,...M\}}{\sum}\frac{c^{2}_{m}}{(1+c_{m}t)^{2}}\geq w^{T}AA^{T}w+\underset{m\in\{1,...M\}}{\sum}\frac{c^{2}_{m}}{(1+\sqrt{f^{\prime\prime}_{v,w}(0)}t)^{2}}\geq\frac{w^{T}AA^{T}w+\underset{m\in\{1,...M\}}{\sum}c^{2}_{m}}{(1+\sqrt{f^{\prime\prime}_{v,w}(0)}t)^{2}}=\frac{f^{\prime\prime}_{v,w}(0)}{(1+\sqrt{f^{\prime\prime}_{v,w}(0)}t)^{2}}$ .

And in the other hand, $f^{\prime\prime}_{v,w}(t)=w^{T}AA^{T}w+\underset{m\in\{1,...M\}}{\sum}\frac{c^{2}_{m}}{(1+c_{m}t)^{2}}\leq w^{T}AA^{T}w+\underset{m\in\{1,...M\}}{\sum}\frac{c^{2}_{m}}{(1-\sqrt{f^{\prime\prime}_{v,w}(0)}t)^{2}}\leq\frac{w^{T}AA^{T}w+\underset{m\in\{1,...M\}}{\sum}c^{2}_{m}}{(1-\sqrt{f^{\prime\prime}_{v,w}(0)}t)^{2}}=\frac{f^{\prime\prime}_{v,w}(0)}{(1-\sqrt{f^{\prime\prime}_{v,w}(0)}t)^{2}}$ . So,

[TABLE]

(This is common to all self concordant functions but here proven directly.)

3.4.2 Newton decrement

Independently from previous 3.4.1, a required lemma is to prove that for any function $G$ from a subset of $\mathbb{R}^{M}$ to $\mathbb{R}$ twice derivable with a positive hessian in a point $\zeta$ , then for any non-null vector $\omega\in\mathbb{R}^{M}$ , the following inequality holds $\frac{\omega^{T}(\nabla_{\zeta}G)}{\sqrt{\omega^{T}(\nabla_{\zeta}^{2}G)\omega}}\leq\sqrt{(\nabla_{\zeta}G)^{T}(\nabla_{\zeta}^{2}G)^{-1}(\nabla_{\zeta}G)}$ .

Indeed, let $\Psi(\omega,t)=-t\times\omega^{T}(\nabla_{\zeta}G)+\frac{t^{2}}{2}\times\omega^{T}(\nabla_{\zeta}^{2}G)\omega$ . The minimum of this function regarding $t$ is for $t=\frac{\omega^{T}(\nabla_{\zeta}G)}{\omega^{T}(\nabla_{\zeta}^{2}G)\omega}$ resulting in $-\frac{(\omega^{T}(\nabla_{\zeta}G))^{2}}{2\omega^{T}(\nabla_{\zeta}^{2}G)\omega}$ .

But, the global minimum regarding both $t$ and $\omega$ (which is thus even lower) is for $t=1$ and $\omega=(\nabla_{\zeta}^{2}G)^{-1}(\nabla_{\zeta}G)$ with resulting value $-\frac{1}{2}(\nabla_{\zeta}G)^{T}(\nabla_{\zeta}^{2}G)^{-1}(\nabla_{\zeta}G)$ Notation: From now, $\sqrt{(\nabla_{v}F_{A})^{T}(\nabla_{v}^{2}F_{A})^{-1}(\nabla_{v}F_{A})}$ will be written $\lambda(v)$ (standard notation for the Newton decrement).

Injecting the bound on $\lambda$ in the case of $f_{v,w}(t)$ says that $\forall v\in]0,\infty[^{M}$ and, $w\in\mathbb{R}^{M}$ , $-\frac{f^{\prime}_{v,w}(0)}{\sqrt{f^{\prime\prime}_{v,w}(0)}}\leq\lambda(v)=\sqrt{(\nabla_{v}F_{A})^{T}(\nabla_{v}^{2}F_{A})^{-1}(\nabla_{v}F_{A})}$ because $f^{\prime}_{v,w}(0)=w^{T}(\nabla_{v}F_{A})$ and $f^{\prime\prime}_{v,w}(0)=w^{T}(\nabla_{v}^{2}F_{A})w$ considering the Taylor expansion of $f_{v,w}(t)=F_{A}(v+tw)$ . Let point out that $\nabla_{v}^{2}F_{A}=A^{T}A+Diag(v_{1},...,v_{M})^{-2}$ i.e. a positive + a strict positive, thus, $\nabla_{v}^{2}F_{A}$ is never singular with smallest eigen value never lower than $\frac{1}{(8M\chi(A)^{T}\chi(A))^{2}}$ .

3.4.3 Effect of a Newton step

By integrating the higher bound of 3.4.1 i.e. $f^{\prime\prime}_{v,w}(t)\leq\frac{f^{\prime\prime}_{v,w}(0)}{(1-\sqrt{f^{\prime\prime}_{v,w}(0)}t)^{2}}$ , one found that

[TABLE]

As one could be interested to minimize the right term, one will consider $w=-(\nabla_{v}^{2}F_{A})^{-1}(\nabla_{v}F_{A})$ seeing 3.4.2 leading to

[TABLE]

In particular

[TABLE]

Notation: $\frac{1}{1+\lambda(v)}\times(\nabla_{v}^{2}F_{A})^{-1}(\nabla_{v}F_{A})$ will be now written $\mathcal{N}(v)$ .

So, $\forall v\in]0,\infty[^{M}$ , it is possible with 1 Newton step to decrease $F_{A}(v)$ by $-\lambda(v)+\log(1+\lambda(v))$ . In particular, $\forall v\in]0,\infty[^{M}$ , if $\lambda(v)\geq O(1)$ , then it is possible to get a decrease of at least $O(1)$ .

3.4.4 Optimality gap

As seen in 3.4.3, if $\lambda(v)\geq O(1)$ , then, it is possible to get a decrease of $O(1)$ . Thus, the Newton method starting from $v_{0}=\frac{1}{\sqrt{\mathbf{1}^{T}AA^{T}\mathbf{1}}}\mathbf{1}$ will eventually find $v$ such that $\lambda(v)\leq O(1)<1$ after at most $O(F_{A}(v_{0})-F_{A}^{*})=O(M\log(\Gamma(A))+M\log(\chi(A)^{T}\chi(A)))$ steps. (Otherwise, one would construct a point $\rho$ such that $F_{A}(\rho)<F_{A}^{*}$ which is a contradiction.)

Now, by integrating the lower bound of 3.4.1 i.e. $f^{\prime\prime}_{v,w}(t)\geq\frac{f^{\prime\prime}_{v,w}(0)}{(1+\sqrt{f^{\prime\prime}_{v,w}(0)}t)^{2}}$ , one founds that

[TABLE]

This bound is not useful if $f^{\prime}_{v,w}(0)+\sqrt{f^{\prime\prime}_{v,w}(0)}<0$ (because, it just tells that $f_{v,w}(t)$ is higher than something which goes to $-\infty$ ). But, if $f^{\prime}_{v,w}(0)+\sqrt{f^{\prime\prime}_{v,w}(0)}>0$ with $f^{\prime}_{v,w}(0)<0$ , then, the right term has a none trivial minimum at $t^{*}=\frac{-f^{\prime}_{v,w}(0)}{f^{\prime\prime}_{v,w}(0)+\sqrt{f^{\prime\prime}_{v,w}(0)}f^{\prime}_{v,w}(0)}$ leading to

[TABLE]

This condition $f^{\prime}_{v,w}(0)+\sqrt{f^{\prime\prime}_{v,w}(0)}>0$ corresponds to $\lambda(v)<1$ .

Now, the function $\phi(u)=u-\log(1+u)$ verifies $\phi^{\prime}(u)=1-\frac{1}{1+u}\geq 0$ for $u\geq 0$ . So, the right term is minimized when $\frac{f^{\prime}_{v,w}(0)}{\sqrt{f^{\prime\prime}_{v,w}(0)}}$ is increased. In particular for $\lambda(v)$ seeing 3.4.2,

[TABLE]

As this is true for all $w$ and $t$ , this is in particular true for $w,t$ leading to the optimum of $F_{A}$ :

[TABLE]

3.4.5 Convergence

Let consider $\alpha(u)=-u+\log(1+u)+\frac{u^{2}}{8}$ .

$\alpha^{\prime}(u)=-1+\frac{1}{u+1}+\frac{u}{4}$ and $\alpha^{\prime\prime}(u)=-\frac{1}{(u+1)^{2}}+\frac{1}{4}$ . $\alpha^{\prime\prime}(u)<0$ for $u\in[0,1]$ so $\alpha^{\prime}(u)$ is decreasing for $u\in[0,1]$ . Yet, $\alpha^{\prime}(0)=0$ . So $\alpha^{\prime}(u)<0$ for $u\in[0,1]$ . So $\alpha(u)$ is decreasing, yet, $\alpha(0)=0$ . So $\alpha(u)<0$ for $u\in[0,1]$ . So $\forall u\in[0,1],\ -u+\log(1+u)+\frac{u^{2}}{8}\leq 0$ i.e. $-u+\log(1+u)\leq-\frac{u^{2}}{8}\leq 0$

So, seeing 3.4.3, $\forall v\in]0,\infty[^{M}$ ,

[TABLE]

On the other hand, let consider $\beta(u)=u+\log(1-u)+2u^{2}$ .

$\beta^{\prime}(u)=1-\frac{1}{1-u}+4u$ , $\beta^{\prime\prime}(u)=4-\frac{1}{(1-u)^{2}}$ . So, $\forall u\in[0,\frac{1}{2}],\beta^{\prime\prime}(u)\geq 0$ , so $\beta^{\prime}$ is increasing for $u\in[0,\frac{1}{2}]$ . But, $\beta^{\prime}(0)=0$ , so $\beta^{\prime}(u)\geq 0$ for $u\in[0,\frac{1}{2}]$ , so $\beta$ is increasing for $u\in[0,\frac{1}{2}]$ . But, $\beta(0)=0$ . So, $\forall u\in[0,\frac{1}{2}],\beta(u)=u+\log(1-u)+2u^{2}\geq 0$ . So, $\forall u\in[0,\frac{1}{2}],u+\log(1-u)\geq-2u^{2}$ .

So the inequality of 3.4.4 becomes

[TABLE]

In this case, it means

[TABLE]

So, for $\lambda(v)\leq\frac{1}{2}$ , on one hand, a Newton step decreases $F(v)-F_{A}^{*}$ by $\frac{\lambda(v)^{2}}{8}$ , but, on the other hand, $\frac{\lambda(v)^{2}}{8}\geq\frac{F(v)-F_{A}^{*}}{16}$ . It means that, for $\lambda(v)\leq\frac{1}{2}$ , $F(v)-F_{A}^{*}$ is decreased by $\frac{F(v)-F_{A}^{*}}{16}$ .

So, when performing one Newton step $v=v-\mathcal{N}(v)$ :

•

either, $\lambda(v)>\frac{1}{2}$ , and $F(v-\mathcal{N}(v))-F_{A}^{*}\leq F(v)-F_{A}^{*}-\frac{1}{2}+\log(\frac{3}{2})$

•

or, $\lambda(v)\leq\frac{1}{2}$ , and, $F(v-\mathcal{N}(v))-F_{A}^{*}\leq\frac{15}{16}(F(v)-F_{A}^{*})$

Thus, the maximal number of step required to reach $F(v)-F_{A}^{*}\leq\varepsilon$ from $F(v_{0})-F_{A}^{*}$ is $\frac{F(v_{0})-F_{A}^{*}}{\frac{1}{2}-\log(\frac{3}{2})}+\frac{\log(\varepsilon)}{\log(\frac{16}{15})}$ . By combining this result with other lemmas, this almost proves lemma 4.

3.4.6 Corollary

As $\chi(A)$ can be linked to a linear system involving only $A$ and 1 and 0 coefficients. Cramer rules allows to write $\chi(A)^{T}\chi(A)$ with sub-determinant of $A$ . So, if $A$ requires $L$ bit to be written in binary, then, $\log(\chi(A)^{T}\chi(A))=O(L)$ . This is also trivially the case for $\Gamma(A)$ .

Thus, the complexity of the Newton descent to find a solution to the linear feasibility problem with $A\in\Omega$ with $L$ binary size is $O(ML)$ Newton steps whose cost is $O(M^{\gamma})$ resulting in a $O(M^{\gamma+1}L)$ complexity $M$ times higher than the current state of the art but still polynomial (and still better than ellipsoid or Karmarkar method).

Other potential interests

Currently, there is multiple variations of [5] (with complexity $O(M^{\gamma+1}L)$ ) to solve linear feasibility query:

•

Minimizing $F_{A}(v)=\frac{v^{T}AA^{T}v}{2}-\underset{m\in\{1,...,M\}}{\sum}\log(v_{m})$ leads to $A(A^{T}v)>\mathbf{0}$ as proven in this technical report.

•

But, minimizing $G_{A}(x)=\underset{m\in\{1,...,M\}}{\sum}\delta A_{m}x-\log(A_{m}x+1)$ also solves this problem. The proof is currently even shorter but with the drawback of requiring the value of $\delta$ . The key ideas of the proof are that one can consider $\kappa(A)=\underset{x,Ax\geq\mathbf{1}}{\min}\mathbf{1}^{T}Ax$ . When, $\delta\leq\frac{\log(2)}{2\mathbf{1}^{T}A\kappa(A)}$ (i.e. $\delta=O(2^{-L})$ ), then, adding $\kappa(A)$ to the optimal solution of $G_{A}$ increase the linear part by less than $\frac{\log(2)}{2}$ . But it increases all $A_{m}x$ by 1. In particular, if some $A_{k}x<0$ , it means that $A_{k}(x+\kappa(A))+1>2\times(A_{k}x+1)$ decreasing the overall function value by at least $\log(2)$ . So while $\neg Ax>\mathbf{0}$ , $G_{A}(x+\kappa(x))\leq G_{A}(x)-\frac{\log(2)}{2}$ . Thus, $\forall x,\ G_{A}(x)-G^{*}_{A}\leq\frac{\log(2)}{2}\Rightarrow Ax>\mathbf{0}$ . (Interestingly, minimization of $G_{A}$ never encountered $\lambda_{G}$ smaller than $\frac{\sqrt{\log(2)}}{2}$ .)

•

And finally, it is also possible to consider the minimization of $J_{A}(x,t)=\Xi\times\sqrt{\chi(A)^{T}\chi(A)}\times t+t^{2}+x^{T}x-\underset{m\in\{1,...,M\}}{\sum}\log(A_{m}x+t)$ . Both the algorithm and the proof are less straightforward. Coarsely if $t\geq 0$ , $J_{A}$ can not be lower than $-M\log(\sigma(A))$ where $\sigma(A)$ is related to the highest eigen value of $A$ . But, $J_{A}\left(\frac{\chi(A)}{\sqrt{\chi(A)^{T}\chi(A)}},-\frac{1}{2\sqrt{\chi(A)^{T}\chi(A)}}\right)\leq-\Xi+M\log(\chi(A)^{T}\chi(A))$ . Thus, for $\Xi\geq M\log(\chi(A)^{T}\chi(A))+M\log(\sigma)$ , the optimal solution of $J_{A}$ has $t\leq 0$ i.e. the $x$ part is a solution to $Ax>\mathbf{0}$ .

Importantly, minimizing $F_{A}(v)$ has also the advantage that the effect of a ceiling of $v$ is easily computed allowing a simple implementation of the minimization of $F_{A}$ with frozen denominator (which can be estimated using $\Gamma(A)$ only - at least as long $\lambda(v)\geq\frac{1}{2}$ ).

Finally, an other interesting point is that, if minimizing $F_{A}$ , $G_{A}$ or $J_{A}$ allows to solve the same problem with same complexity, the 3 dynamics during the minimization processes may not behave the same. In particular, $F_{A}$ does not seems to be just the dual of $G_{A}$ or the same with just another regularization. Preliminary numerical experiments seems to indicate that $F_{A}$ and $G_{A}$ seems to have critical different dynamics (see https://hal.science/hal-02399129v18). This may be a potential way to bypass recent negative result on linear programming solver [1].

Bibliography6

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Xavier Allamigeon, Stéphane Gaubert, and Nicolas Vandame. No self-concordant barrier interior point method is strongly polynomial. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing , pages 515–528, 2022.
2[2] Stephen P Boyd and Lieven Vandenberghe. Convex optimization . Cambridge university press, 2004.
3[3] Michael B Cohen, Yin Tat Lee, and Zhao Song. Solving linear programs in the current matrix multiplication time. Journal of the ACM (JACM) , 68(1):1–39, 2021.
4[4] Arkadi Nemirovski. Interior point polynomial time methods in convex programming. Lecture notes , 42(16):3215–3224, 2004.
5[5] James Renegar. A polynomial-time algorithm, based on newton’s method, for linear programming. Mathematical programming , 40(1):59–93, 1988.
6[6] Jan van den Brand. A deterministic linear program solver in current matrix multiplication time. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms , pages 259–278. SIAM, 2020.