Euclidean Contractivity of Neural Networks with Symmetric Weights

Veronica Centorrino; Anand Gokhale; Alexander Davydov; Giovanni Russo; and Francesco Bullo

arXiv:2302.13452·math.OC·May 16, 2023

Euclidean Contractivity of Neural Networks with Symmetric Weights

Veronica Centorrino, Anand Gokhale, Alexander Davydov, Giovanni Russo, and Francesco Bullo

PDF

Open Access

TL;DR

This paper analyzes the stability of certain neural network models with symmetric weights using contraction theory, providing conditions for Euclidean contractivity and applying results to quadratic optimization.

Contribution

It introduces new algebraic results and sufficient conditions for Euclidean contractivity in neural networks with symmetric weights, including non-smooth activations.

Findings

01

Contraction rates are log-optimal for most symmetric matrices.

02

Provided stability conditions for Hopfield and firing-rate networks.

03

Applied contraction analysis to optimize quadratic problems with box constraints.

Abstract

This paper investigates stability conditions of continuous-time Hopfield and firing-rate neural networks by leveraging contraction theory. First, we present a number of useful general algebraic results on matrix polytopes and products of symmetric matrices. Then, we give sufficient conditions for strong and weak Euclidean contractivity, i.e., contractivity with respect to the $ℓ_{2}$ norm, of both models with symmetric weights and (possibly) non-smooth activation functions. Our contraction analysis leads to contraction rates which are log-optimal in almost all symmetric synaptic matrices. Finally, we use our results to propose a firing-rate neural network model to solve a quadratic optimization problem with box constraints.

Equations172

μ (A) := h \to 0^{+} lim \frac{∥ I _{n} + h A ∥ - 1}{h} .

μ (A) := h \to 0^{+} lim \frac{∥ I _{n} + h A ∥ - 1}{h} .

μ_{p, Q_{1} Q_{2}} (A) = μ_{p, Q_{1}} (Q_{2} A Q_{2}^{- 1}) .

μ_{p, Q_{1} Q_{2}} (A) = μ_{p, Q_{1}} (Q_{2} A Q_{2}^{- 1}) .

osL (f_{t}) = x \in C sup μ (D f (t, x)),

osL (f_{t}) = x \in C sup μ (D f (t, x)),

osL_{2, Q^{1/2}} (f_{t}) = x, y \in C, x \neq = y sup \frac{( x - y ) ^{⊤} Q ( f ( x ) - f ( y ))}{∥ x - y ∥ _{2, Q^{1/2}}^{2}} .

osL_{2, Q^{1/2}} (f_{t}) = x, y \in C, x \neq = y sup \frac{( x - y ) ^{⊤} Q ( f ( x ) - f ( y ))}{∥ x - y ∥ _{2, Q^{1/2}}^{2}} .

osL (f_{t}) \leq - c, for all t \in R_{\geq 0},

osL (f_{t}) \leq - c, for all t \in R_{\geq 0},

μ (D f (t, x)) \leq - c, for all x \in C and t \in R_{\geq 0} .

μ (D f (t, x)) \leq - c, for all x \in C and t \in R_{\geq 0} .

\overset{x}{˙}_{F}

\overset{x}{˙}_{F}

\overset{x}{˙}_{H}

W = U Λ U^{⊤},

W = U Λ U^{⊤},

\theta_{b}(z):=2b\big{(}1+\sqrt{1-z/b}\big{)},\quad\forall z\in{]{-}\infty,b]}.

\theta_{b}(z):=2b\big{(}1+\sqrt{1-z/b}\big{)},\quad\forall z\in{]{-}\infty,b]}.

Q_{F, b}

Q_{F, b}

Q_{H, b}

Q_{H, b}

g_{b} (z) := 2 b \frac{1 + 1 - z / b}{z}, \forall z \in] - \infty, b] ∖ {0} .

g_{b} (z) := 2 b \frac{1 + 1 - z / b}{z}, \forall z \in] - \infty, b] ∖ {0} .

\mathcal{P}=\Big{\{}\sum_{j=1}^{m}\beta_{j}A_{j}\;\big{|}\;\beta_{j}\geq 0,\sum_{j=1}^{m}\beta_{j}=1\Big{\}}

\mathcal{P}=\Big{\{}\sum_{j=1}^{m}\beta_{j}A_{j}\;\big{|}\;\beta_{j}\geq 0,\sum_{j=1}^{m}\beta_{j}=1\Big{\}}

A \in P max α (A) = j \in {1, \dots, m} max μ (A_{j});

A \in P max α (A) = j \in {1, \dots, m} max μ (A_{j});

A \in P max α (A) \leq j \in {1, \dots, m} max μ (A_{j}) \leq A \in P max α (A) + ε .

A \in P max α (A) \leq j \in {1, \dots, m} max μ (A_{j}) \leq A \in P max α (A) + ε .

P_{F}

P_{F}

\displaystyle=\Big{\{}\sum_{j=1}^{2^{n}}\beta_{j}A_{j}\;\big{|}\;\beta_{j}\geq 0,\sum_{j=1}^{2^{n}}\beta_{j}=1\Big{\}}.

d \in [0, 1]^{n} max μ_{2, Q_{F, α (W)}} ([d] W) = d \in [0, 1]^{n} max α ([d] W) = α (W) .

d \in [0, 1]^{n} max μ_{2, Q_{F, α (W)}} ([d] W) = d \in [0, 1]^{n} max α ([d] W) = α (W) .

d \in [0, 1]^{n} max μ_{2, Q_{H, α (W)}} (W [d]) = d \in [0, 1]^{n} max α (W [d]) = α (W);

d \in [0, 1]^{n} max μ_{2, Q_{H, α (W)}} (W [d]) = d \in [0, 1]^{n} max α (W [d]) = α (W);

d \in [0, 1]^{n} max μ_{2, Q_{F, ε}} ([d] W) \leq d \in [0, 1]^{n} max α ([d] W) + ε = ε;

d \in [0, 1]^{n} max μ_{2, Q_{F, ε}} ([d] W) \leq d \in [0, 1]^{n} max α ([d] W) + ε = ε;

d \in [0, 1]^{n} max μ_{2, (- W)^{1/2}} ([d] W)

d \in [0, 1]^{n} max μ_{2, (- W)^{1/2}} ([d] W)

d \in [0, 1]^{n} max μ_{2, (- W)^{1/2}} (W [d])

0 \leq \frac{ϕ ( x ) - ϕ ( y )}{x - y} \leq 1, for all x, y \in R, x \neq = y .

0 \leq \frac{ϕ ( x ) - ϕ ( y )}{x - y} \leq 1, for all x, y \in R, x \neq = y .

osL_{2, Q_{F, α (W)}} (f_{F}) \leq - 1 + α (W),

osL_{2, Q_{F, α (W)}} (f_{F}) \leq - 1 + α (W),

osL_{2, Q_{F, ε}} (f_{F}) \leq - 1 + ε,

osL_{2, Q_{F, ε}} (f_{F}) \leq - 1 + ε,

osL_{2, (- W)^{1/2}} (f_{F}) \leq - 1.

osL_{2, (- W)^{1/2}} (f_{F}) \leq - 1.

μ_{2, Q_{F, α (W)}} (D f_{F} (x))

μ_{2, Q_{F, α (W)}} (D f_{F} (x))

\leq d \in [0, 1]^{n} max μ_{2, Q_{F, α (W)}} (- I_{n} + [d] W)

= - 1 + α (W),

osL_{2, Q_{H, α (W)}} (f_{H}) \leq - 1 + α (W),

osL_{2, Q_{H, α (W)}} (f_{H}) \leq - 1 + α (W),

osL_{2, (- W)^{1/2}} (f_{H}) \leq - 1.

osL_{2, (- W)^{1/2}} (f_{H}) \leq - 1.

μ_{2, Q_{H, α (W)}} (D f_{H} (x))

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Control and Stability of Dynamical Systems · Model Reduction and Neural Networks

Full text

\includeversion

tac \excludeversionarxiv

Euclidean Contractivity of Neural Networks

with Symmetric Weights

Veronica Centorrinoa, Anand Gokhaleb, Alexander Davydovb,

Giovanni Russoc and Francesco Bullob This work was in part supported by AFOSR project FA9550-21-1-0203. The authors thank Dr. Leo Kozachkov for insightful comments.aVeronica Centorrino is with Scuola Superiore Meridionale, University of Naples Federico II, Italy. [email protected].bAnand Gokhale, Alexander Davydov, and Francesco Bullo are with the Center for Control, Dynamical Systems, and Computation, UC Santa Barbara, Santa Barbara, CA 93106 USA. [email protected], [email protected], [email protected].cGiovanni Russo is with the Department of Information and Electric Engineering and Applied Mathematics, University of Salerno, Italy. [email protected].

Abstract

This paper investigates stability conditions of continuous-time Hopfield and firing-rate neural networks by leveraging contraction theory. First, we present a number of useful general algebraic results on matrix polytopes and products of symmetric matrices. Then, we give sufficient conditions for strong and weak Euclidean contractivity, i.e., contractivity with respect to the $\ell_{2}$ norm, of both models with symmetric weights and (possibly) non-smooth activation functions. Our contraction analysis leads to contraction rates which are log-optimal in almost all symmetric synaptic matrices. Finally, we use our results to propose a firing-rate neural network model to solve a quadratic optimization problem with box constraints.

I Introduction

Continuous-time recurrent neural networks (RNNs) are dynamical models widely studied in computational neuroscience and machine learning. Recent interest has focused on establishing the contractivity properties of RNNs. Contracting dynamics are robustly stable, feature computationally friendly methods for equilibrium computation, and enjoy many other properties. Motivated by optimization [18, 2] and neuroscientific applications [16], [8, Chapter 17], this paper focuses on symmetric synaptic interactions.

While a comprehensive contractivity analysis with respect to $\ell_{1}$ and $\ell_{\infty}$ norms was recently presented in [5], the corresponding analysis with respect to weighted Euclidean norms is not complete yet. A recent breakthrough in this direction was obtained by [11]; this work extends and complements these results (a detailed comparison is offered below).

Two common models of RNNs are the firing-rate neural network (FNN) and Hopfield neural network (HNN); the main difference being the order by which the activation function acts. Under mild assumptions, FNNs are positive systems and, arguably, more biologically-plausible. HNNs are relevant in optimization and machine learning [18, 2, 16, 21]. For certain synaptic matrices and initial conditions, FNN and HNN are known to be equivalent via an appropriate change of coordinates and input transformation [14]. However, the understanding of this partial correspondence is not complete and, as we will show below, their contractivity properties are not exactly coincident.

Related literature

RNNs naturally emerge when modelling neural processes [8]. Critical questions when studying RNNs are related to finding conditions that guarantee stability and robustness of the network. For example, sufficient conditions for the stability of HNNs are given in [7] based on the use of Lyapunov diagonally stable matrices. Stability and robustness can be simultaneously established using contraction theory. Indeed, contracting systems exhibit highly ordered transient and asymptotic behaviors that appear to be convenient in the context of RNNs. For example: (i) initial conditions are exponentially forgotten [13]; (ii) time-invariant dynamics admits a unique globally exponential stable equilibrium [13]; (iii) contraction ensures entrainment to periodic inputs [17] and (iv) enjoy highly robust behavior, such as input-to-state stability [20]. (v) Moreover, efficient numerical algorithms can be devised for numerical integration and fixed point computation of contracting systems [10]. Recently, non-Euclidean contractivity of RNNs is studied in [5] and in [4], where stability properties of HNN and FNN with dynamic synapses undergoing Hebbian learning are proposed. Euclidean contractivity is studied in [12] to analyze the stability of RNNs with dynamic synapses and in [11], where a number of contractivity conditions are proposed. Finally, the design of norms minimizing the logarithmic norm is reviewed in [3, Section 2.7].

Contributions:

our main results are a set of sufficient conditions characterizing strong and weak infinitesimal contractivity properties (see Section II for the definitions) of FNNs and HNNs with symmetric weights and possibly non-smooth activation functions. We also establish a lower bound on the contraction rate and, remarkably, demonstrate that the bound is log-optimal in almost all symmetric weight matrices. One of the main benefits of our approach to the study of FNNs and HNNs is that, with just a single condition, it ensures global exponential convergence, along with all the other useful properties of contracting systems. The main results leverage a number of general algebraic results, which are interesting per se and are also a contribution of this paper. With these algebraic results, we: (i) determine a weighted $\ell_{2}$ norm for matrix polytopes which is log-optimal for almost all synaptic matrices; (ii) give a lower bound on the spectral abscissa of matrix polytopes; (iii) provide optimal and log-optimal norms for the product of symmetric matrices. Finally, we leverage our sufficient conditions for contractivity to propose a FNN solving certain quadratic optimization problems with box constraints.

Our results for strong infinitesimal contractivity of the FNN and HNN models with symmetric weights are based on and generalize [11, Theorem 2]. Specifically, (i) we provide the explicit expression of the matrix weights for which the models are contracting. The matrices we find are different for the two models, highlighting the importance of choosing the appropriate model based on the properties being studied; (ii) we address the weak contractivity case, i.e., when the contraction rate is [math], making it applicable for, e.g., systems that enjoy conservation or invariance properties; (iii) we handle weakly increasing and (iv) locally Lipschitz activation functions, allowing us to consider common activation functions such as the rectified linear unit (ReLU) and soft thresholding functions.

II Mathematical Preliminaries

We denote by $(\cdot)_{+}\colon\mathbb{R}\rightarrow\mathbb{R}_{\geq 0}$ the function $(z)_{+}=z$ if $z>0$ , $(z)_{+}=0$ if $z\leq 0$ . Given $x\in\mathbb{R}^{n}$ , we define $[x]\in\mathbb{R}^{n\times n}$ to be the diagonal matrix with diagonal entries equal to $x$ . Vector inequalities of the form $x\leq(\geq)~{}y$ are entrywise. We let $\mbox{1}_{n}$ , $\mbox{0}_{n}\in\mathbb{R}^{n}$ be the all-ones and all-zeros vectors, respectively, $I_{n}$ be the $n\times n$ identity matrix, and $\mathbb{S}^{n}$ be the set of real symmetric $n\times n$ matrices. For $A\in\mathbb{R}^{n\times n}$ , let $\operatorname{spec}(A)$ , $\rho(A):=\max\{\left|\lambda\right|\;|\;\lambda\in\operatorname{spec}(A)\}$ and $\alpha(A):=\max\{\Re(\lambda)\;|\;\lambda\in\operatorname{spec}(A)\}$ denote the spectrum, spectral radius and the spectral abscissa of $A$ , respectively; here $\Re(\lambda)$ denotes the real part of $\lambda$ . For $A\in\mathbb{S}^{n}$ , let $\lambda_{\textup{min}}(A)$ and $\lambda_{\textup{max}}(A)$ denote its minimum and maximum eigenvalue, respectively. Given $A,B\in\mathbb{S}^{n}$ , we write $A\preceq B$ (resp. $A\prec B$ ) if $B-A$ is positive semidefinite (resp. definite). The Moore–Penrose inverse of $A\in\mathbb{R}^{n\times n}$ is the unique matrix ${A}^{\dagger}\in\mathbb{R}^{n\times n}$ such that $A{A}^{\dagger}A=A$ , ${A}^{\dagger}A{A}^{\dagger}={A}^{\dagger}$ , with $A{A}^{\dagger}$ , ${A}^{\dagger}A\in\mathbb{S}^{n}$ . Finally, whenever it is clear from the context, we omit to specify the dependence of functions on time $t$ .

II-A Norms and induced norms

Let $\|\cdot\|$ denote both a norm on $\mathbb{R}^{n}$ and its corresponding induced matrix norm on $\mathbb{R}^{n\times n}$ . Given $A\in\mathbb{R}^{n\times n}$ the logarithmic norm (log-norm) induced by $\|\cdot\|$ is

[TABLE]

Specifically, the Euclidean vector norm, matrix norm, and log-norm are, respectively: ${\displaystyle\|x\|_{2}=\sqrt{x^{\top}x}}$ , $\allowbreak\displaystyle{\|A\|_{2}=\sqrt{\lambda_{\textup{max}}(A^{\top}A)}}$ , and $\displaystyle\mu_{2}(A)=\frac{1}{2}\lambda_{\textup{max}}\left(A{+}A^{\top}\right)$ .

For an $\ell_{p}$ norm, $p\in[1,\infty]$ , and for an invertible matrix $Q\in\mathbb{R}^{n\times n}$ , the $Q$ -weighted $\ell_{p}$ norm is defined as $\|x\|_{p,Q}:=\|Qx\|_{p}$ . The corresponding log-norm is $\mu_{p,Q}(A)=\mu_{p}(QAQ^{-1})$ . Specifically, the weighted Euclidean vector norm, matrix norm, and log-norm are, respectively: $\displaystyle\|x\|_{2,Q}=\|Qx\|_{2}$ , $\displaystyle\|A\|_{2,Q^{1/2}}=\sqrt{\lambda_{\textup{max}}(Q^{-1}A^{\top}QA)}$ , and $\displaystyle{\mu_{2,Q^{1/2}}(A)=\frac{1}{2}\lambda_{\textup{max}}\left(QAQ^{-1}{+}A^{\top}\right)}$ .

For two invertible matrices $Q_{1}$ , $Q_{2}\in\mathbb{R}^{n\times n}$ , it holds

[TABLE]

Given $f\colon\mathbb{R}_{\geq 0}\times C\rightarrow\mathbb{R}^{n}$ , with $C\subseteq\mathbb{R}^{n}$ open and connected, we denote by $\operatorname{\mathsf{osL}}(f_{t})$ the one-sided Lipschitz constant of $f_{t}:=f(t,\cdot)$ . For continuously differentiable $f_{t}$ and convex set $C$ it holds

[TABLE]

where $Df(t,x):=\partial f(t,x)/\partial x$ is the Jacobian of $f$ with respect to $x$ . We write $\operatorname{\mathsf{osL}}_{p,Q}(f_{t})$ to specify that the one-sided Lipschitz constant is computed with respect to a $Q$ -weighted $\ell_{p}$ norm. Specifically, for the weighted Euclidean norm we have:

[TABLE]

We refer to [3] for a recent review of those tools.

II-B Contraction theory for dynamical systems

We start with the following

Definition 1.

Given a norm, a function $f\colon\mathbb{R}_{\geq 0}\times C\rightarrow\mathbb{R}^{n}$ , with $C\subseteq\mathbb{R}^{n}$ $f$ -invariant, open and convex, and a constant $c>0$ ( $c=0)$ referred as contraction rate, $f$ is strongly (weakly) infinitesimally contracting on $C$ if

[TABLE]

or, equivalently for differentiable vector fields, if

[TABLE]

One of the main benefits of contraction theory is that, with just a single condition, it ensures global exponential convergence, along with other useful properties, as highlighted in the introduction Section.

The next result [5, Theorem 16] allows using condition (2) for locally Lipschitz function, for which, by Rademacher’s theorem, $Df(t,x)$ exists almost everywhere (a.e.) in $C$ .

Theorem 1.

Consider a norm, a function $f\colon\mathbb{R}_{\geq 0}\times C\rightarrow\mathbb{R}^{n}$ locally Lipschitz on $C\subset\mathbb{R}^{n}$ open and convex set. Then for every $c\in\mathbb{R}$ the following statements are equivalent:

(i)

$\operatorname{\mathsf{osL}}(f_{t})\leq c$ , for all $t\in\mathbb{R}_{\geq 0}$ , 2. (ii)

$\mu(Df(t,x))\leq c$ , for a.e. $x\in C$ and $t\in\mathbb{R}_{\geq 0}$ .

II-C Hopfield and firing-rate continuous-time neural networks

We are interested in the following continuous-time FNN and HNN models defined, respectively, as:

[TABLE]

where: $x_{\textup{F}}$ , $x_{\textup{H}}\in\mathbb{R}^{n}$ are neural activation vectors, $\Phi\colon\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}$ is a nonlinear and diagonal activation function, i.e., for $x\in\mathbb{R}^{n}$ , $(\Phi(x))_{i}=\phi(x_{i})$ , where $\phi\colon\mathbb{R}\rightarrow\mathbb{R}$ . $W\in\mathbb{R}^{n\times n}$ is the synaptic matrix, with $W_{ij}\in\mathbb{R}$ being the synaptic weight from neuron $j$ to neuron $i$ . Finally, $u_{\textup{F}}$ , $u_{\textup{H}}\in\mathbb{R}^{n}$ are the external stimuli in the FNN and HNN, respectively. The models (3) and (4) assume homogeneous dissipation rates; we leave the heterogeneous case to future work.

Remark 2.

When the activation function is non-negative the positive orthant is forward-invariant for $f_{\textup{F}}$ in (3) and $x_{\textup{F}}$ is interpreted as a firing-rate. Instead, in (4) $x_{\textup{H}}$ is sign indefinite and is interpreted as a membrane potential.

III Main Results

This section presents the main results of the paper. Namely, we study Euclidean contractivity properties of continuous-time RNNs with symmetric weights.

First, we give algebraic results on weighted $\ell_{2}$ norms of certain matrix polytopes. Then, we use those results to give sufficient conditions for the strong infinitesimal contractivity of the FNN and the HNN with symmetric weights with respect to weighted Euclidean norms.

Assumption 1 (Symmetric synaptic weights).

The synaptic matrix $W\in\mathbb{R}^{n\times n}$ is symmetric.

Under Assumption 1, the eigenvalues of $W$ are real, $\alpha(W)=\lambda_{\textup{max}}(W)$ and $W\preceq\alpha(W)I_{n}$ . Moreover, $W$ can be decomposed as

[TABLE]

where $U\in\mathbb{R}^{n\times n}$ is the orthogonal matrix whose columns are the eigenvectors of $W$ , and $\Lambda=[\lambda]\in\mathbb{R}^{n\times n}$ is diagonal with $\lambda\in\mathbb{R}^{n}$ being the vector of the eigenvalues of $W$ .

Given $b>0$ , we define ${\theta_{b}\colon]{-}\infty,b]\rightarrow[2b,+\infty[}$ by

[TABLE]

We illustrate $\theta_{b}(\cdot)$ in Figure 1. For our derivations, it is useful to introduce the shorthand notation ${\theta_{b}(\Lambda):=[(\theta_{b}(\lambda_{1}),\dots,\theta_{b}(\lambda_{n}))]}$ . Also, we introduce $Q_{\textup{F},b}\in\mathbb{R}^{n\times n}$

[TABLE]

and, when $W$ is invertible, $Q_{\textup{H},b}\in\mathbb{R}^{n\times n}$ is defined as

[TABLE]

Remark 3.

The matrix $Q_{\textup{H},b}$ defined in (8) can be written as $Q_{\textup{H},b}=Ug_{b}(\Lambda)U^{\textsf{T}}$ , where we use the notation $g_{b}(\Lambda):=[g_{b}(\lambda_{1}),\dots,g_{b}(\lambda_{n})]$ , with $g_{b}(\cdot)$ defined by

[TABLE]

III-A *Results on the

Euclidean log-norm of matrix polytopes*

First, we give the following definition for polytopes.

Definition 2 (Log-optimal and log- $\varepsilon$ -optimal norms for matrix polytopes).

Given $A_{1},\dots,A_{m}\in\mathbb{R}^{n\times n}$ , consider the polytope

[TABLE]

and a scalar $\varepsilon>0$ . We say that the norm $\|\cdot\|$ is

(i)

logarithmically optimal (log-optimal) for $\mathcal{P}$ * if*

[TABLE] 2. (ii)

logarithmically $\varepsilon$ -optimal (log- $\varepsilon$ -optimal) for $\mathcal{P}$ * if*

[TABLE]

We are specifically interested in the matrix polytopes defined as ${\mathcal{P}_{\textup{F}}:=\{[d]W\;|\;d\in[0,1]^{n}\}}$ and ${\mathcal{P}_{\textup{H}}:=\{W[d]\;|\;d\in[0,1]^{n}\}}$ . Namely, in Theorem 5 we give algebraic results on the Euclidean log-norm of matrices in $\mathcal{P}_{\textup{F}}$ and $\mathcal{P}_{\textup{H}}$ (the proof is in Section IV, together with a number of instrumental results).

Remark 4.

It is always possible to rewrite $\mathcal{P}_{\textup{F}}$ and $\mathcal{P}_{\textup{H}}$ in the form of Definition 2. In fact, let $A_{1},\dots,A_{2^{n}}\in\mathbb{R}^{n\times n}$ be the $2^{n}$ vertices defined by $A_{j}=[v_{j}]W$ where $v_{j}\in\{0,1\}^{n}$ is the binary vector with entries either [math] or $1$ (note that there are $2^{n}$ such binary vectors). Then the set $\{\sum_{j=1}^{2^{n}}\beta_{j}A_{j}\;|\;\beta_{j}\geq 0,\sum_{j=1}^{2^{n}}\beta_{j}=1\}$ is exactly the set $\mathcal{P}_{F}:=\{[d]W\;|\;d\in[0,1]^{n}\}$ . To prove this, note that the vertices of the convex set $[0,1]^{n}$ are the $2^{n}$ vectors $v_{j}$ . Therefore, given $d\in[0,1]^{n}$ there exist $\beta_{j}\geq 0$ , $j=1,\dots,2^{n}$ , with $\sum_{j=1}^{2^{n}}\beta_{j}=1$ such that $[d]=\sum_{j=1}^{2^{n}}\beta_{j}[v_{j}]$ . Thus,

[TABLE]

The same reasoning holds for $\mathcal{P}_{\textup{H}}$ .

Theorem 5 (Euclidean log-norm of matrix polytopes).

Given a symmetric synaptic matrix $W$ (Assumption 1), the following statements holds:

(i)

if $\alpha(W)>0$ , then $\|\cdot\|_{2,Q_{\textup{F},\alpha(W)}}$ , with $Q_{\textup{F},\alpha(W)}\in\mathbb{R}^{n\times n}$ defined in (7), is log-optimal for $\mathcal{P}_{\textup{F}}$ , i.e.,

[TABLE]

In addition, if $W$ is invertible, then $\|\cdot\|_{2,Q_{\textup{H},\alpha(W)}}$ , with $Q_{\textup{H},\alpha(W)}\in\mathbb{R}^{n\times n}$ defined in (8), is log-optimal for $\mathcal{P}_{\textup{H}}$ , i.e.,

[TABLE] 2. (ii)

if $\alpha(W)=0$ , then for each $\varepsilon>0$ the norm $\|\cdot\|_{2,Q_{\textup{F},\varepsilon}}$ , with $Q_{\textup{F},\varepsilon}\in\mathbb{R}^{n\times n}$ defined in (7), is log $\varepsilon$ -optimal for $\mathcal{P}_{\textup{F}}$ , i.e.,

[TABLE] 3. (iii)

if $\alpha(W)<0$ , then $\|\cdot\|_{2,(-W)^{1/2}}$ is log-optimal for $\mathcal{P}_{\textup{F}}$ and $\mathcal{P}_{\textup{H}}$ , i.e.,

[TABLE]

Remark 6.

Theorem 5 applies to polytopes of the form $aI_{n}{+}[d]W$ and of the form $aI_{n}{+}W[d]$ , for all $a\in\mathbb{R}$ . This follows from the log-norm translation property, i.e., for all $A\in\mathbb{R}^{n\times n}$ $\mu(A+aI_{n})=\mu(A)+a$ .

III-B Contractivity of recurrent neural networks

Next, we consider the neural network dynamics for the FNN in (3) and for the HNN in (4).

Assumption 2 (Slope-restricted activation function).

The activation function $\phi\colon\mathbb{R}\rightarrow\mathbb{R}$ is Lipschitz and slope restricted in $[0,1]$ , i.e.,

[TABLE]

Assumption 2 ensures that $\phi^{\prime}(x)\in[0,1]$ for almost all $x\in\mathbb{R}$ . Many common activation functions including ReLU, and sigmoid, satisfy Assumption 2, possibly after rescaling. In fact, Assumption 2 can be relaxed for larger classes of coupling by restricting the slope to $[0,\bar{d}]$ , where $\bar{d}>0$ . By defining $[d]:=D\Phi/{\bar{d}}$ and $W:=\bar{d}W$ our following results still hold for this general case, with $\alpha(W)$ replaced by $\alpha(\bar{d}\cdot W)={\bar{d}}\cdot\alpha(W)$ . We assume $\bar{d}=1$ to simplify the notation.

III-B1 Contractivity of firing rate neural networks

We now provide an upper bound on the $\ell_{2}$ one-sided Lipschitz constant and sufficient conditions for the Euclidean contractivity of FNNs with symmetric weights.

Theorem 7 (Euclidean one-sided Lipschitz constant of the FNN).

Consider the FNN (3) satisfying Assumptions 1, 2:

(i)

if $\alpha(W)>0$ , then

[TABLE]

with $Q_{\textup{F},\alpha(W)}\in\mathbb{R}^{n\times n}$ defined in (7); 2. (ii)

if $\alpha(W)=0$ , then

[TABLE]

with ${Q_{\textup{F},\varepsilon}\in\mathbb{R}^{n\times n}}$ defined in (7); 3. (iii)

if $\alpha(W)<0$ , then

[TABLE]

Proof.

Regarding part (i) note that for almost all $x\in\mathbb{R}^{n}$ we have

[TABLE]

where the last equality follows by the log-norm translation property and part (i) in Theorem 5. The proof follows by applying Theorem 1. Parts (ii) and (iii) can be proved similarly, using parts (ii) and (iii) in Theorem 5.∎

Remark 8.

Under further assumptions on the synaptic matrix and the activation function, some inequalities in Theorem 7 are tight – see Appendix B.

The next result follows from Theorem 7.

Corollary 9 (Euclidean contractivity of the FNN).

Under the same assumptions and notations as in Theorem 7,

(i)

if $\alpha(W)=1$ , then the FNN is weakly infinitesimally contracting with respect to $\|\cdot\|_{2,Q_{\textup{F},\alpha(W)}}$ ; 2. (ii)

if ${0<\alpha(W)<1}$ , then the FNN is strongly infinitesimally contracting with rate ${1-\alpha(W)>0}$ with respect to ${\|\cdot\|_{2,Q_{\textup{F},\alpha(W)}}}$ ; 3. (iii)

if $\alpha(W)=0$ , then for any $0<\varepsilon<1$ the FNN is strongly infinitesimally contracting with rate $1-\varepsilon>0$ with respect to $\|\cdot\|_{2,Q_{\textup{F},\varepsilon}}$ ; 4. (iv)

if $\alpha(W)<0$ , then the FNN is strongly infinitesimally contracting with rate $1$ with respect to ${\|\cdot\|_{2,(-W)^{1/2}}}$ .

III-B2 Contractivity of Hopfield neural networks

We first provide an upper bound on the Euclidean one-sided Lipschitz constant and sufficient conditions for the $\ell_{2}$ contractivity of HNNs with non-singular symmetric synaptic matrix. Then, we give sufficient conditions for the $\ell_{2}$ contractivity with singular symmetric synapses. This latter result is proven in Section IV: differently from our analysis on FNNs, it requires a distinct mathematical approach.

Theorem 10 (Euclidean one-sided Lipschitz constant of the HNN with non-singular symmetric weights).

Consider the HNN (4) satisfying Assumptions 1, 2 with non-singular weight matrix $W$ ,

(i)

if $\alpha(W)>0$ , then

[TABLE]

with $Q_{\textup{H},\alpha(W)}\in\mathbb{R}^{n\times n}$ defined in (8); 2. (ii)

if $\alpha(W)<0$ , then

[TABLE]

Proof.

Regarding part (i), note that for almost all $x\in\mathbb{R}^{n}$ we have

[TABLE]

where the last equality follows by the log-norm translation property and part (i) in Theorem 5. The proof then follows by applying Theorem 1. Part (ii) can be proved similarly, using part (iii) in Theorem 5. ∎

Remark 11.

Following the same reasoning as in Appendix B, under the same assumptions of Theorem 10, if the activation function satisfies $\inf_{x\in\mathbb{R}}\phi^{\prime}(x)=0$ , and $\sup_{x\in\mathbb{R}}\phi^{\prime}(x)=1$ , then the inequalities in Theorem 10 are tight.

Corollary 12 (Euclidean contractivity of the HNN with non-singular symmetric weights).

Under the same assumptions and notations as in Theorem 10,

(i)

if $\alpha(W)=1$ , then the HNN is weakly infinitesimally contracting with respect to ${\|\cdot\|_{2,Q_{\textup{H},\alpha(W)}}}$ ; 2. (ii)

if $0<\alpha(W)<1$ , then the HNN is strongly infinitesimally contracting with rate $1-\alpha(W)>0$ with respect to ${\|\cdot\|_{2,Q_{\textup{H},\alpha(W)}}}$ ; 3. (iii)

if $\alpha(W)<0$ , then the HNN is strongly infinitesimally contracting with rate $1$ with respect to ${\|\cdot\|_{2,(-W)^{1/2}}}$ .

Finally, we give sufficient infinitesimal contractivity conditions of the HNN with singular symmetric synapses (see Section IV for the proof).

Theorem 13 (Contractivity of the HNN with singular symmetric weights).

Consider the HNN (4) satisfying Assumptions 1, 2 with $W$ having kernel $\mathcal{K}\neq\{\mbox{0}_{n}\}$ , and such that $\alpha(W)<1$ . Then, for each $\varepsilon\in{]0,1-\alpha(W)[}$ the HNN is strongly infinitesimally contracting with rate $|1{-}\alpha(W){-}\varepsilon|$ .

Remark 14.

If $W=0$ , then the FNN (3) and the HNN (4) are contracting with rate 1. As a consequence of Corollaries 9, 12 and Theorem 13, when coupling is added to the networks, they remain (strongly) contracting as long as ${\alpha(W)}<1$ . Note that the entries of $W$ are allowed to be large, so as the activation function and this allows to have different types of coupling as long as the matrix $I_{n}-W$ is Hurwitz.

IV Proofs and Additional Results

We now present additional algebraic results on matrix polytopes and symmetric matrices, and the proofs of Theorems 5 and 13. First, we give a technical result for the spectral abscissa of matrix polytopes.

Lemma 15 (Lower bound on spectral abscissa of polytope of matrices).

For any $W\in\mathbb{R}^{n\times{n}}$ , we have

[TABLE]

Proof.

First, note that the spectral abscissa is a continuous function and that the set $\mathcal{P}_{\textup{F}}$ is compact, hence the maximum is well defined. To prove (10) we compute:

[TABLE]

The same calculation applies to prove inequality (11). ∎

We now give the proof of Theorem 5. To enhance clarity we prove its parts case by case. Lemma 16 and parts (i) and (ii) in Theorem 5, are based upon and extend the treatment in [11, Theorem 2] – see our statement of contributions.

Lemma 16 (Splitting upper-bounded symmetric matrices).

Consider $W$ satisfying Assumptions 1. Assume $W\preceq bI_{n}$ , for some $b>0$ and let $\theta_{b}(\cdot)$ and $Q_{\textup{F},b}$ be defined in (6) and (7), respectively. Then,

[TABLE]

Proof.

By definition of the function $\theta_{b}(\cdot)$ , for all $\lambda_{i}\leq b$ , $i\in\{\,1,\dots,n\,\}$ , it holds

[TABLE]

In fact, we have

[TABLE]

Equation (13) implies $\Lambda=\theta_{b}(\Lambda){-}\frac{1}{4{b}}\theta_{b}(\Lambda)^{2}$ . Equality (12) follows by multiplying by $U$ and $U^{\top}$ to the left and to the right, respectively, with $U$ defined in (5). ∎

First, we prove part (i), i.e., the log-optimality of the norm $\|\cdot\|_{2,Q_{\textup{F},\alpha(W)}}$ and, when $W$ is invertible, of $\|\cdot\|_{2,Q_{\textup{H},\alpha(W)}}$ for multiplicatively-scaled matrices with positive maximum eigenvalue.

Proof of part (i).

First, we prove that $\|\cdot\|_{2,Q_{\textup{F},\alpha(W)}}$ is log-optimal for $\mathcal{P}_{\textup{F}}$ and $\displaystyle\max_{d\in[0,1]^{n}}\alpha([d]W)=\alpha(W)$ . To this purpose, define

[TABLE]

Lemma 16 implies $W=Q_{\textup{F},\alpha(W)}-P$ . Next, pick $d\in\mathbb{R}^{n}$ satisfying $\mbox{0}_{n}<d\leq\mbox{1}_{n}$ , so that $[d]$ is diagonal and invertible. Then

[TABLE]

Since $P[d]P\succ 0$ , we can apply the Schur complement to this LMI to conclude that

[TABLE]

Setting $y=(y_{1},y_{1})$ for arbitrary $y_{1}\in\mathbb{R}^{n}$ , the inequality (15) implies

[TABLE]

In summary, we have established that the weak LMI (14) (independent of $d$ ) implies the weak LMI (16) for all $0<d\leq\mbox{1}_{n}$ . Here, by weak LMI, we mean to state that the linear matrix inequality is not strict. It is known [9, Theorem 6.3.5] that the eigenvalues of a symmetric matrix are continuous functions of the matrix entries. Therefore, the LMI (16) holds also for $\mbox{0}_{n}\leq d\leq\mbox{1}_{n}$ . Finally, note that the LMI (16) is equivalent to the condition ${\mu_{2,Q_{\textup{F},\alpha(W)}}([d]W)\leq\alpha(W)}$ for all ${d\in[0,1]^{n}}$ , therefore

[TABLE]

Moreover, it is well known [6] that for every log-norm $\mu$ and every matrix $A$ it holds $\alpha(A)\leq\mu(A)$ . Specifically in our case:

[TABLE]

The proof then follows from (10), after noticing that in this case $\alpha(W)_{+}=\alpha(W)$ .

Next, assume that $W$ is invertible. We need to prove that $\|\cdot\|_{2,Q_{\textup{H},\alpha(W)}}$ is log-optimal for $\mathcal{P}_{\textup{H}}$ and that it holds $\displaystyle\max_{d\in[0,1]^{n}}\alpha(W[d])=\alpha(W)$ . We have

[TABLE]

where the last equality follows from the log-optimality of $\|\cdot\|_{2,Q_{\textup{F},\alpha(W)}}$ for $\mathcal{P}_{\textup{F}}$ . The proof again follows from (10). ∎

The proof of part (ii) of Theorem 5, i.e., the log-optimality of the weighted $\ell_{2}$ norm $\|\cdot\|_{2,Q_{\textup{F},\varepsilon}}$ for multiplicatively-scaled negative semidefinite matrices, follows the same reasoning as that of part (i) by considering $\varepsilon>0$ instead of $\alpha(W)$ . Hence, we omit it here for brevity.

Finally, we prove part (iii), i.e., the log-optimality of $\|\cdot\|_{2,(-W)^{1/2}}$ for multiplicatively-scaled negative definite matrices. To do so, we give the following algebraic result.

Lemma 17 (Optimal norms for products of symmetric matrices).

Let $A_{1}=SQ\in\mathbb{R}^{n\times n}$ and $A_{2}=QS\in\mathbb{R}^{n\times n}$ where $S$ , $Q\in\mathbb{S}^{n}$ , with $Q\succ 0$ . Then, for each $i\in\{\,1,2\,\}$ ,

(i)

$\operatorname{spec}(A_{i})$ * is real and has the same number of negative, zero, and positive eigenvalues as $S$ ;* 2. (ii)

the norm $\|\cdot\|_{2,Q^{1/2}}$ is optimal for the matrix $A_{i}$ , i.e., $\|A_{i}\|_{2,Q^{1/2}}=\rho(A_{i})$ ; 3. (iii)

the norm $\|\cdot\|_{2,Q^{1/2}}$ is log-optimal for $A_{i}$ , i.e., $\mu_{2,Q^{1/2}}(A_{i})=\alpha(A_{i})$ .

Proof.

Let $i=1$ . $A_{1}$ is similar to $Q^{1/2}SQ^{1/2}\in\mathbb{S}^{n}$ , hence $\operatorname{spec}(A_{1})$ is real. Part (i) then follows from Sylvester’s law of inertia, noting that $Q^{1/2}SQ^{1/2}$ is congruent to $S$ . Regarding part (ii), we compute

[TABLE]

where the last equality follows from the fact that $(SQ)^{2}$ has the same eigenvectors as $SQ$ and real eigenvalues equal to the square of the real eigenvalues of $SQ$ . Finally, to prove part (iii) we compute

[TABLE]

This concludes the proof of part (ii). The proof for $i=2$ is a straightforward adaptation. ∎

Proof of part (iii).

Pick $d\in\mathbb{R}^{n}$ satisfying $\mbox{0}_{n}\leq d\leq\mbox{1}_{n}$ and consider the matrices $[d]W$ and $W[d]$ . Lemma 17 with ${S:=[-d]}$ and ${Q:=-W\succ 0}$ , implies that the spectrum of the product matrices ${[d]W=[-d](-W)}$ and ${W[d]=(-W)[-d]}$ is real and has the same number of negative, zero, positive eigenvalues as ${[-d]}$ . Therefore,

[TABLE]

Maximizing over $d\in[0,1]^{n}$ we get part (iii). ∎

Finally, we give the proof of Theorem 13.

Proof of Theorem 13.

Let $r$ be the number of non-zero eigenvalues of $W\in\mathbb{R}^{n\times n}$ . Without loss of generality, we reorder the elements in $\lambda\in\mathbb{R}^{n}$ and $U\in\mathbb{R}^{n\times n}$ , so that $\lambda=(\lambda_{1},\dots,\lambda_{r},0,\dots,0)$ and $U=[u_{1},\dots,u_{r},u_{r+1},\dots,u_{n}]$ , where $u_{i}\in\mathbb{R}^{n}$ is the eigenvector of $\lambda_{i}\in\mathbb{R}$ .

Next, let $\mathcal{K}^{*}:=\operatorname{\mathsf{span}}\{\,u_{1},\dots,u_{r}\,\}$ , $n_{\parallel}:=\dim(\mathcal{K}^{*})$ , ${\mathcal{K}:=\operatorname{\mathsf{span}}\{\,u_{r+1},\dots,u_{n}\,\}}$ , $n_{\perp}:=\dim(\mathcal{K})$ , and define $U_{\parallel}:=[u_{1},\dots,u_{r}]\in\mathbb{R}^{n\times n_{\parallel}}$ , $U_{\perp}:=[u_{r+1},\dots,u_{n}]\in\mathbb{R}^{n\times n_{\perp}}$ , so that $U=[U_{\parallel}\quad U_{\perp}]$ .

We have $\mathbb{R}^{n}=\{\,x\in\mathbb{R}^{n}\ |\ x\in\mathcal{K}^{*}\,\}\oplus\{\,x\in\mathbb{R}^{n}\ |\ x\in\mathcal{K}\,\}$ . Therefore, given $x\in\mathbb{R}^{n}$ we can always define ${x_{\parallel}=U_{\parallel}^{\top}x\in\mathcal{K}^{*}}$ and $x_{\perp}=U_{\perp}^{\top}x\in\mathcal{K}$ . We note that $U^{\textsf{T}}U=I_{n}$ implies $U_{\parallel}^{\textsf{T}}U_{\parallel}=I_{n_{\parallel}}$ , $U_{\perp}^{\textsf{T}}U_{\perp}=I_{n_{\perp}}$ , $U_{\perp}^{\textsf{T}}U_{\parallel}=\mbox{0}_{n_{\perp}\times n_{\parallel}}$ , and $U_{\parallel}^{\textsf{T}}U_{\perp}=\mbox{0}_{n_{\parallel}\times n_{\perp}}$ . Also,

[TABLE]

Moreover, we have

[TABLE]

In fact, from Corollary 9 we know:

[TABLE]

By multiplying by $U_{\parallel}^{\top}$ and $U_{\parallel}$ to the left and to the right, respectively, we get

[TABLE]

Thus, $\mu_{2,\theta_{\parallel}}(-I_{n_{\parallel}}{+}U_{\parallel}^{\top}[d]U_{\parallel}\Lambda_{\parallel})\leq-1+\alpha(W)$ . Next, by multiplying (4) by $U_{\perp}^{\top}$ and $U_{\parallel}^{\top}$ we obtain the interconnected system:

[TABLE]

thus,

[TABLE]

Equation (21) is always contracting with respect to any norm in the subspace $\mathcal{K}$ with $\operatorname{\mathsf{osL}}(f_{\textup{H}}^{\perp})={-}1$ , being $\mu(Df_{\textup{H}}^{\perp})=\mu(-I_{n_{\perp}})=-1$ . For system (22) we define $Q_{\textup{H},\alpha(W)}:=Q_{\textup{F},\alpha(W)}{W}^{\dagger}=U\theta_{\alpha{(W)}}\Lambda^{\dagger}U^{\textsf{T}}$ , where ${W}^{\dagger}=U\Lambda^{\dagger}U^{\textsf{T}}$ , with

[TABLE]

Next, we note that the matrix $Q_{\textup{H$ \parallel $}}:=U_{\parallel}^{\top}Q_{\textup{F},\alpha(W)}{W}^{\dagger}U_{\parallel}=\theta_{\parallel}\Lambda_{\parallel}^{-1}$ and that $Df_{\textup{H}}^{\parallel}={-}I_{n_{\parallel}}{+}\Lambda_{\parallel}U_{\parallel}^{\top}[d]U_{\parallel}$ . Thus, we have

[TABLE]

Thus system (22) is strongly infinitesimally contracting in $\mathcal{K}^{*}$ with respect to $\|\cdot\|_{Q_{\textup{H$ \parallel $}}}$ with rate $1-\alpha(W)$ .

Finally, we note that at fixed $x_{\parallel}$ and $t$ , the map $x_{\perp}\to f_{\parallel}$ is Lipschitz with constant ${\textup{L}_{\parallel\perp}:=\alpha(W)}$ . In fact, ${\forall x_{\perp}^{1},x_{\perp}^{2}\in\mathcal{K}}$ , we get

[TABLE]

We can now construct the gain matrix (30)

[TABLE]

The eigenvalues of $\Gamma$ are $\lambda_{1}=-1,\lambda_{2}={-}1{+}\alpha(W)$ . The fact that $\mathcal{K}\neq\{\mbox{0}_{n}\}$ implies $\alpha(W)\geq 0$ . In turn, since by assumptions $\alpha(W)<1$ , we have $\lambda_{2}\in{[-1,0[}$ . Thus $\Gamma$ is Hurwitz and $\alpha(\Gamma)={-}1+\alpha(W)$ . By applying Theorem 21, for each $\varepsilon\in{]0,1-\alpha(W)[}$ we have that the HNN is strongly infinitesimally contracting with rate $|\alpha(\Gamma)+\varepsilon|$ . This concludes the proof. ∎

V Using Euclidean contractivity to solve quadratic optimization problems

We now apply the previous results to propose a firing-rate neural network solving certain quadratic optimization problems with box constraints. By utilizing Corollary 9, we ensure global exponential convergence of our dynamic, along with all the other properties of contracting systems.

Given $A=A^{\top}\succ 0$ , an input $u\in\mathbb{R}^{n}$ , and $\mu\leq\nu\in\mathbb{R}^{n}$ the quadratic optimization problem with box constraints is

[TABLE]

Note that $J_{A,u}(\cdot)$ is strongly convex and the constraints are convex, thus (24) admits a unique global optimal solution.

We propose the following FNN model to solve (24). Given a single-layered neural network of $n$ neurons, the state ${x\in\mathbb{R}^{n}}$ evolves according to

[TABLE]

with output $y=x$ . The activation function $\operatorname{sat}_{\mu,\nu}({\cdot})\colon\mathbb{R}^{n}\rightarrow[\mu,\nu]:=[\mu_{1},\nu_{1}]\times\dots\times[\mu_{n},\nu_{n}]$ , illustrated in Figure 2, is defined as $(\operatorname{sat}_{\mu,\nu}({x}))_{i}=\operatorname{sat}_{\mu_{i},\nu_{i}}({x_{i}})$ , where $\operatorname{sat}_{\mu_{i},\nu_{i}}({\cdot})\colon\mathbb{R}\rightarrow[\mu_{i},\nu_{i}]$ is

[TABLE]

To simplify the notation, whenever it is clear from the context, we use the same symbol for both the scalar and vector forms of the saturation function.

Remark 18.

The function $\operatorname{sat}_{\mu_{i},\nu_{i}}({\cdot})$ satisfies Assumption (2). Almost everywhere, its partial derivative is ${\partial\operatorname{sat}_{a,b}({\cdot})\colon\mathbb{R}\setminus\{\,a,b\,\}\to\{0,1\}}$ defined by

[TABLE]

Next, we use Corollary 9 to give sufficient conditions for the strong infinitesimal contractivity of (25). Then, we show that the equilibrium of (25) is the optimal solution of (24).

Lemma 19 (Strong infinitesimal contractivity).

Let ${A=A^{\top}\succ 0}$ in (25). The FNN (25) is strongly infinitesimally contracting with rate $c>0$ with respect to thee norm $\|\cdot\|_{2,P}$ , where

(i)

if $\lambda_{\textup{min}}(A)<1$ , then $c=\lambda_{\textup{min}}(A)$ and ${P=Q_{\textup{F},1{-}\lambda_{\textup{min}}(A)}}$ , with $Q_{\textup{F},1{-}\lambda_{\textup{min}}(A)}$ defined in (7); 2. (ii)

if $\lambda_{\textup{min}}(A)=1$ , then for any $0<\varepsilon<1$ , $c=1-\varepsilon>0$ and $P=Q_{\textup{F},\varepsilon}$ , with $Q_{\textup{F},\varepsilon}$ defined in (7); 3. (iii)

if $\lambda_{\textup{min}}(A)>1$ , then $c=1$ and $P=(A-I_{n})^{1/2}$ .

Proof.

The thesis follows by applying Corollary 9 noticing that $A\succ 0$ implies $W=I_{n}{-}A\prec I_{n}$ , thus $\alpha(W)=1{-}\lambda_{\textup{min}}(A)<1$ , and $\operatorname{sat}_{\mu,\nu}({\cdot})$ satisfies Assumption 2. ∎

An immediate consequence of Lemma 19 is that (25) admits a unique equilibrium point. Next, we prove that this equilibrium point is the optimal solution of (24).

Lemma 20.

The vector $x^{*}\in\mathbb{R}^{n}$ is the global minimum for (24) if and only if $x^{*}$ is the equilibrium point of (25).

Proof.

Let $x^{*}\in\mathbb{R}^{n}$ be a global minimum for (24), thus $x^{*}\in[\mu,\nu]$ . Then it follows from the KKT conditions that, for all $i\in\{\,1,\dots,n\,\}$ ,

[TABLE]

Note that $x^{*}$ is an equilibrium of (25) if, for all $i$ , we have

[TABLE]

If $x^{*}_{i}=\mu_{i}$ , let $z^{\star}:=\left.(Ax^{*})_{i}\right|_{x^{*}_{i}=\mu_{i}}{-}u_{i}$ . By definition of $\operatorname{sat}_{\mu_{i},\nu_{i}}({\cdot})$ it holds ${-}\mu_{i}+\operatorname{sat}_{\mu_{i},\nu_{i}}({\mu_{i}{-}z^{\star}})\geq 0$ . Moreover, from the KKT conditions (27), and being $\operatorname{sat}_{\mu_{i},\nu_{i}}({\cdot})$ monotonically non-decreasing, we get the reverse inequality. Thus $x^{*}_{i}=\mu_{i}$ verifies (28). Similarly it can be proved that (28) holds for $\mu_{i}<x^{*}_{i}<\nu_{i}$ , and $x^{*}_{i}=\nu_{i}$ .

Vice versa, let $x^{*}\in\mathbb{R}^{n}$ be an equilibrium of (24), i.e., (28) holds. If $x^{*}_{i}\leq\mu_{i}$ , then (28) implies $x^{*}_{i}=\operatorname{sat}_{\mu_{i},\nu_{i}}({\mu_{i}{-}z^{\star}}).$ By definition of $\operatorname{sat}_{\mu_{i},\nu_{i}}({\cdot})$ we get $x^{*}_{i}\in[\mu_{i},\nu_{i}]$ , thus $x^{*}_{i}=\mu_{i}$ , and $\mu_{i}{-}z^{\star}\leq\mu_{i}$ , which implies $z^{\star}\geq 0$ . Similarly, if $\mu_{i}<x^{*}_{i}<\nu_{i}$ , then $z^{\star}=0$ , while if $x^{*}_{i}\geq\nu_{i}$ , then $x^{*}_{i}=\nu_{i}$ and $z^{\star}\leq 0$ . This ends the proof since we have shown that the KKT conditions (27) hold for all $i$ .∎

VI Conclusion

We presented sharp conditions for strong and weak Euclidean contractivity of Hopfield and firing-rate neural networks with symmetric weights together with a number of general algebraic results. Specifically, we analyzed the Euclidean log-norm of matrix polytopes, proposing norms that are log-optimal for almost all matrices, and provided optimal and log-optimal norms for the product of symmetric matrices. We considered networks with (possibly) non-smooth activation functions, which allows us to consider common activation functions such as ReLU and the soft thresholding function. Finally, to demonstrate the practical implications of our results, we proposed a FNN to solve quadratic optimization problems with box constraints.

As future work, it would be useful to (i) extend our results to arbitrary synaptic matrices (as opposed to only symmetric) and heterogeneous dissipation matrices, (ii) establish higher-order contractivity properties [19] and consider stochastic models [1], and (iii) apply these results to neuroscience and machine learning problems. For example, we plan to study sparse reconstruction networks (inspired by [16]) and implicit learning models (e.g., see [15]).

Appendix A Interconnected systems

In this section, we briefly review the theory of contracting interconnected systems, that we used to prove Theorem 13. We refer to [3] for a recent and more detailed review.

Given $r$ positive integers $n_{1},\dots,n_{r}$ such that $n_{1}+\dots+n_{r}=n$ , consider the decomposition $\mathbb{R}^{n}=\mathbb{R}^{n_{1}}\times\dots\times\mathbb{R}^{n_{r}}$ , a local norm $\|\cdot\|_{i}$ on $\mathbb{R}^{n_{i}}$ , for each $i\in\{1,\dots,r\}$ , with associated log-norm $\mu_{i}(\cdot)$ . Consider the interconnection of $r$ dynamical systems

[TABLE]

where $x_{i}\in\mathbb{R}^{n_{i}}$ , and $x_{-i}\in\mathbb{R}^{n-n_{i}}$ denote the vector $x$ without the component $x_{i}$ . We recall the following results that will be useful for our analysis.

Theorem 21 (Contractivity of interconnected system).

Consider the interconnected system in (29). Assume

( $A$ 1)

(contractivity-at-each-node) at fixed $x_{-i}$ and $t$ , each function $x_{i}\to f_{i}(t,x_{i},x_{-i})$ is strongly infinitesimally contracting with rate $c_{i}$ with respect to $\|\cdot\|_{i}$ . 2. ( $A$ 2)

(Lipschitz interconnections) at fixed $x_{i}$ and $t$ , each function $x_{-i}\to f_{i}(t,x_{i},x_{-i})$ is Lipschitz with Lipschitz constant $\gamma_{ij}\in\mathbb{R}_{\geq 0}$ .

Define the gain matrix

[TABLE]

If $\Gamma$ is Hurwitz, then the interconnected system is strongly infinitesimally contracting with respect to $\|\cdot\|_{\eta}$ and with rate $|\alpha(\Gamma)+\varepsilon|$ , where $\eta\in\mathbb{R}^{n}_{>0}$ , $\|\cdot\|_{\eta}^{2}:=\sum_{i=1}^{r}\eta_{i}\|x_{i}\|_{i}^{2}$ , and $\epsilon>0$ .

Appendix B Justification for Remark 8

Lemma 22.

Given the FNN (3) with symmetric (Assumption 1) and invertible synaptic matrix $W$ , Lipschitz and slope restricted in $[0,1]$ (Assumption 2) activation function $\phi$ satisfying $\inf_{x\in\mathbb{R}}\phi^{\prime}(x)=0$ and $\sup_{x\in\mathbb{R}}\phi^{\prime}(x)=1$ ,

(i)

if $\alpha(W)>0$ , then

[TABLE]

with $Q_{\textup{F},\alpha(W)}\in\mathbb{R}^{n\times n}$ defined in (7); 2. (ii)

if $\alpha(W)<0$ , then

[TABLE]

Proof.

The proof of both parts follows by applying Theorem (7) and noticing that under the above assumptions for any log-norm $\mu$ it holds the reverse inequality

[TABLE]

To prove (31), let $h\colon\mathbb{R}\setminus\Omega_{\phi}\rightarrow[0,1]$ be the function defined by $h(x)=\phi^{\prime}(x)$ where $\Omega_{\phi}$ is the measure zero set of points in $\mathbb{R}$ where $\phi$ is not differentiable. It is well-known that for any closed and bounded set $S\subset\mathbb{R}$ , $S\supseteq\{\inf(S),\sup(S)\}$ . Then, since $h$ is bounded, the closure of $\operatorname{\operatorname{Im}}(h)$ satisfies

[TABLE]

Letting $\Omega_{\Phi}$ be the measure zero points in $\mathbb{R}^{n}$ where $\Phi$ is not differentiable, we compute

[TABLE]

We justify the above (in)equalities as follows. Equality (33) holds because $W$ is invertible. Inequality (36) holds because of the condition (32). Finally, equality (37) follows because $\mu$ is a convex function of its argument and the maximum value of a convex function over a polytope occurs at one of its vertices.

In particular, for the respective choice of norm in parts (i) and (ii), the result is proved in view of Theorem 5 and the translation property for log-norms. ∎

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Z. Aminzare. Stochastic logarithmic Lipschitz constants: A tool to analyze contractivity of stochastic differential equations. IEEE Control Systems Letters , 6:2311–2316, 2022. doi:10.1109/LCSYS.2022.3148945 . · doi ↗
2[2] A. Bouzerdoum and T. R. Pattison. Neural network for quadratic optimization with bound constraints. IEEE Transactions on Neural Networks , 4(2):293–304, 1993. doi:10.1109/72.207617 . · doi ↗
3[3] F. Bullo. Contraction Theory for Dynamical Systems . Kindle Direct Publishing, 1.1 edition, 2023, ISBN 979-8836646806. URL: http://motion.me.ucsb.edu/book-ctds .
4[4] V. Centorrino, F. Bullo, and G. Russo. Modelling and contractivity of neural-synaptic networks with Hebbian learning. Automatica , July 2022. Submitted. doi:10.48550/ar Xiv.2204.05382 . · doi ↗
5[5] A. Davydov, A. V. Proskurnikov, and F. Bullo. Non-Euclidean contraction analysis of continuous-time neural networks. IEEE Transactions on Automatic Control , September 2022. Submitted. doi:10.48550/ar Xiv.2110.08298 . · doi ↗
6[6] C. A. Desoer and M. Vidyasagar. Feedback Systems: Input-Output Properties . Academic Press, 1975, ISBN 978-0-12-212050-3. doi:10.1137/1.9780898719055 . · doi ↗
7[7] M. Forti and A. Tesi. New conditions for global stability of neural networks with application to linear and quadratic programming problems. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications , 42(7):354–366, 1995. doi:10.1109/81.401145 . · doi ↗
8[8] W. Gerstner, W. M. Kistler, R. Naud, and L. Paninski. Neuronal Dynamics: From Single Neurons To Networks and Models of Cognition . Cambridge University Press, 2014, ISBN 9781107635197. URL: https://neuronaldynamics.epfl.ch .

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Euclidean Contractivity of Neural Networks

Abstract

I Introduction

Related literature

Contributions:

II Mathematical Preliminaries

II-A Norms and induced norms

II-B Contraction theory for dynamical systems

Definition 1**.**

Theorem 1**.**

II-C Hopfield and firing-rate continuous-time neural networks

Remark 2**.**

III Main Results

Assumption 1** (Symmetric synaptic weights).**

Remark 3**.**

III-A *Results on the

Definition 2** (Log-optimal and log-ε\varepsilonε-optimal norms for matrix polytopes).**

Remark 4**.**

Theorem 5** (Euclidean log-norm of matrix polytopes).**

Remark 6**.**

III-B Contractivity of recurrent neural networks

Assumption 2** (Slope-restricted activation function).**

III-B1 Contractivity of firing rate neural networks

Theorem 7** (Euclidean one-sided Lipschitz constant of the FNN).**

Proof.

Remark 8**.**

Corollary 9** (Euclidean contractivity of the FNN).**

III-B2 Contractivity of Hopfield neural networks

Theorem 10** (Euclidean one-sided Lipschitz constant of the HNN with non-singular symmetric weights).**

Proof.

Remark 11**.**

Corollary 12** (Euclidean contractivity of the HNN with non-singular symmetric weights).**

Theorem 13** (Contractivity of the HNN with singular symmetric weights).**

Remark 14**.**

IV Proofs and Additional Results

Lemma 15** (Lower bound on spectral abscissa of polytope of matrices).**

Proof.

Lemma 16** (Splitting upper-bounded symmetric matrices).**

Proof.

Proof of part (i).

Lemma 17** (Optimal norms for products of symmetric matrices).**

Proof.

Proof of part (iii).

Proof of Theorem 13.

V Using Euclidean contractivity to solve quadratic optimization problems

Remark 18**.**

Lemma 19** (Strong infinitesimal contractivity).**

Proof.

Lemma 20**.**

Proof.

VI Conclusion

Appendix A Interconnected systems

Theorem 21** (Contractivity of interconnected system).**

Appendix B Justification for Remark 8

Lemma 22**.**

Proof.

Definition 1.

Theorem 1.

Remark 2.

Assumption 1 (Symmetric synaptic weights).

Remark 3.

Definition 2 (Log-optimal and log- $\varepsilon$ -optimal norms for matrix polytopes).

Remark 4.

Theorem 5 (Euclidean log-norm of matrix polytopes).

Remark 6.

Assumption 2 (Slope-restricted activation function).

Theorem 7 (Euclidean one-sided Lipschitz constant of the FNN).

Remark 8.

Corollary 9 (Euclidean contractivity of the FNN).

Theorem 10 (Euclidean one-sided Lipschitz constant of the HNN with non-singular symmetric weights).

Remark 11.

Corollary 12 (Euclidean contractivity of the HNN with non-singular symmetric weights).

Theorem 13 (Contractivity of the HNN with singular symmetric weights).

Remark 14.

Lemma 15 (Lower bound on spectral abscissa of polytope of matrices).

Lemma 16 (Splitting upper-bounded symmetric matrices).

Lemma 17 (Optimal norms for products of symmetric matrices).

Remark 18.

Lemma 19 (Strong infinitesimal contractivity).

Lemma 20.

Theorem 21 (Contractivity of interconnected system).

Lemma 22.