Hessian-information geometric formulation of a class of deterministic   neural network models

Shin-itiro Goto

arXiv:1904.12734·math-ph·May 16, 2019

Hessian-information geometric formulation of a class of deterministic neural network models

Shin-itiro Goto

PDF

Open Access

TL;DR

This paper introduces a geometric framework for deterministic neural network models using Hessian and information geometry, linking phase space properties to differential operators on manifolds, with explicit calculations for sigmoid activations.

Contribution

It formulates neural network dynamics within a Hessian geometric framework, connecting phase space compressibility to Laplace operators on Hessian manifolds, and explicitly analyzes sigmoid functions.

Findings

01

Phase space compressibility expressed via Laplace operator on Hessian manifolds.

02

Explicit calculation of compressibility for sigmoid activation functions.

03

Utilization of dual coordinates in information geometry for neural network analysis.

Abstract

In this paper a class of dynamical systems describing deterministic neural network models are formulated from a viewpoint of differential geometry. This class includes the Hopfield model and gradient systems, and is such that the so-called activation functions induce information and Hessian geometries. In this formulation, it is shown that the phase space compressibility of a dynamical system belonging to this class is written in terms of the Laplace operator defined on Hessian manifolds, where phase space compressibility is associated with a volume-form of a manifold, and expresses how such a volume-form is compressed along the vector field of a dynamical system. Since the sigmoid function, as an activation function, plays a role in the study of neural network models, such compressibility is explicitly calculated for this case. Throughout this paper, the so-called dual coordinates…

Equations145

V_{a} = \frac{\partial Ψ}{\partial U ^{a}} .

V_{a} = \frac{\partial Ψ}{\partial U ^{a}} .

\frac{d}{d t} U^{a} = F^{a} (V), F^{a} (V) = - \frac{\partial H}{\partial V _{a}},

\frac{d}{d t} U^{a} = F^{a} (V), F^{a} (V) = - \frac{\partial H}{\partial V _{a}},

\frac{d H}{d t} = - a \sum b \sum \frac{d U ^{a}}{d t} \frac{\partial ^{2} Ψ}{\partial U ^{a} \partial U ^{b}} \frac{d U ^{b}}{d t} < 0.

\frac{d H}{d t} = - a \sum b \sum \frac{d U ^{a}}{d t} \frac{\partial ^{2} Ψ}{\partial U ^{a} \partial U ^{b}} \frac{d U ^{b}}{d t} < 0.

\frac{d}{d t} U^{a} = - b \sum δ^{ab} V_{b} = - b \sum δ^{ab} \frac{\partial Ψ}{\partial U ^{b}} .

\frac{d}{d t} U^{a} = - b \sum δ^{ab} V_{b} = - b \sum δ^{ab} \frac{\partial Ψ}{\partial U ^{b}} .

\frac{d}{d t} U^{a} = b \sum J^{ab} V_{b} - \frac{1}{R _{a}} U^{a} + I_{ext}^{a}, V_{a} = Υ (U^{a}), Υ (U^{a}) := \frac{\partial Ψ}{\partial U ^{a}} = \frac{d ψ}{d U ^{a}}, Ψ (U) = a \sum ψ (U^{a}),

\frac{d}{d t} U^{a} = b \sum J^{ab} V_{b} - \frac{1}{R _{a}} U^{a} + I_{ext}^{a}, V_{a} = Υ (U^{a}), Υ (U^{a}) := \frac{\partial Ψ}{\partial U ^{a}} = \frac{d ψ}{d U ^{a}}, Ψ (U) = a \sum ψ (U^{a}),

H (V) = - a \sum [\frac{1}{2} b \sum J^{ab} V_{a} V_{b} - \int_{0}^{V_{a}} \frac{1}{R _{a}} Υ^{- 1} (V^{'}) d V^{'} + V_{a} I_{ext}^{a}],

H (V) = - a \sum [\frac{1}{2} b \sum J^{ab} V_{a} V_{b} - \int_{0}^{V_{a}} \frac{1}{R _{a}} Υ^{- 1} (V^{'}) d V^{'} + V_{a} I_{ext}^{a}],

\frac{\partial H}{\partial V _{a}} = - [b \sum J^{ab} V_{b} - \frac{1}{R _{a}} U^{a} + I_{ext}^{a}] = - \frac{d}{d t} U^{a}

\frac{\partial H}{\partial V _{a}} = - [b \sum J^{ab} V_{b} - \frac{1}{R _{a}} U^{a} + I_{ext}^{a}] = - \frac{d}{d t} U^{a}

\frac{d H ( V )}{d t} = a \sum \frac{\partial H}{\partial V _{a}} \frac{d V _{a}}{d t} = a \sum \frac{\partial H}{\partial V _{a}} \frac{d Υ}{d U ^{a}} \frac{d U ^{a}}{d t} = - a \sum \frac{d U ^{a}}{d t} \frac{d Υ}{d U ^{a}} \frac{d U ^{a}}{d t} < 0.

\frac{d H ( V )}{d t} = a \sum \frac{\partial H}{\partial V _{a}} \frac{d V _{a}}{d t} = a \sum \frac{\partial H}{\partial V _{a}} \frac{d Υ}{d U ^{a}} \frac{d U ^{a}}{d t} = - a \sum \frac{d U ^{a}}{d t} \frac{d Υ}{d U ^{a}} \frac{d U ^{a}}{d t} < 0.

g^{Ψ} = a \sum b \sum g_{ab}^{Ψ} d U^{a} \otimes d U^{b}, \mbox w h er e g_{ab}^{Ψ} = \frac{\partial ^{2} Ψ}{\partial U ^{a} \partial U ^{b}} .

g^{Ψ} = a \sum b \sum g_{ab}^{Ψ} d U^{a} \otimes d U^{b}, \mbox w h er e g_{ab}^{Ψ} = \frac{\partial ^{2} Ψ}{\partial U ^{a} \partial U ^{b}} .

g_{ab}^{Ψ} = \frac{\partial V _{a}}{\partial U ^{b}} = \frac{\partial ^{2} Ψ}{\partial U ^{a} \partial U ^{b}},

g_{ab}^{Ψ} = \frac{\partial V _{a}}{\partial U ^{b}} = \frac{\partial ^{2} Ψ}{\partial U ^{a} \partial U ^{b}},

g_{Ψ}^{ab} = \frac{\partial U ^{a}}{\partial V _{b}} = \frac{\partial ^{2} Ψ ^{*}}{\partial V _{a} \partial V _{b}},

g_{Ψ}^{ab} = \frac{\partial U ^{a}}{\partial V _{b}} = \frac{\partial ^{2} Ψ ^{*}}{\partial V _{a} \partial V _{b}},

Ψ^{*} (V) := [a \sum U^{a} V_{a} - Ψ (U)]_{U = U (V)} .

Ψ^{*} (V) := [a \sum U^{a} V_{a} - Ψ (U)]_{U = U (V)} .

g^{Ψ} = a \sum b \sum g_{ab}^{Ψ} d U^{a} \otimes d U^{b} = a \sum b \sum g_{Ψ}^{ab} d V_{a} \otimes d V_{b} .

g^{Ψ} = a \sum b \sum g_{ab}^{Ψ} d U^{a} \otimes d U^{b} = a \sum b \sum g_{Ψ}^{ab} d V_{a} \otimes d V_{b} .

g^{Ψ} (\frac{\partial}{\partial U ^{a}}, \frac{\partial}{\partial V _{b}}) = δ_{a}^{b} .

g^{Ψ} (\frac{\partial}{\partial U ^{a}}, \frac{\partial}{\partial V _{b}}) = δ_{a}^{b} .

X [g^{Ψ} (Y, Z)] = g^{Ψ} (\nabla_{X}^{Ψ} Y, Z) + g^{Ψ} (Y, \nabla_{X}^{Ψ *} Z),

X [g^{Ψ} (Y, Z)] = g^{Ψ} (\nabla_{X}^{Ψ} Y, Z) + g^{Ψ} (Y, \nabla_{X}^{Ψ *} Z),

g^{Ψ} (X_{H}, -) = - d H,

g^{Ψ} (X_{H}, -) = - d H,

X_{H} = a \sum \dot{U}^{a} \frac{\partial}{\partial U ^{a}}, \mbox w i t h \dot{U}^{a} = \frac{d U ^{a}}{d t} .

X_{H} = a \sum \dot{U}^{a} \frac{\partial}{\partial U ^{a}}, \mbox w i t h \dot{U}^{a} = \frac{d U ^{a}}{d t} .

g^{Ψ} (X_{H}, -) = a \sum b \sum g_{ab}^{Ψ} \dot{U}^{a} d U^{b}, \mbox an d d H = a \sum b \sum \frac{\partial H}{\partial V _{a}} \frac{d V _{a}}{d U ^{b}} d U^{b} = a \sum b \sum g_{ab}^{Ψ} \frac{\partial H}{\partial V _{a}} d U^{b},

g^{Ψ} (X_{H}, -) = a \sum b \sum g_{ab}^{Ψ} \dot{U}^{a} d U^{b}, \mbox an d d H = a \sum b \sum \frac{\partial H}{\partial V _{a}} \frac{d V _{a}}{d U ^{b}} d U^{b} = a \sum b \sum g_{ab}^{Ψ} \frac{\partial H}{\partial V _{a}} d U^{b},

b \sum g_{ab}^{Ψ} \frac{d U ^{a}}{d t} = - b \sum g_{ab}^{Ψ} \frac{\partial H}{\partial V _{b}} .

b \sum g_{ab}^{Ψ} \frac{d U ^{a}}{d t} = - b \sum g_{ab}^{Ψ} \frac{\partial H}{\partial V _{b}} .

\frac{d U ^{a}}{d t} = - \frac{\partial H}{\partial V _{a}} .

\frac{d U ^{a}}{d t} = - \frac{\partial H}{\partial V _{a}} .

X = a = 1 \sum n \overset{q}{˙}^{q} \frac{\partial}{\partial q ^{a}}, \overset{q}{˙}^{a} = - \overset{κ}{ˇ}^{a} q^{q}, a \in {1, \dots, n}

X = a = 1 \sum n \overset{q}{˙}^{q} \frac{\partial}{\partial q ^{a}}, \overset{q}{˙}^{a} = - \overset{κ}{ˇ}^{a} q^{q}, a \in {1, \dots, n}

L_{X} Ω = [(L_{X} d q^{1}) \land \dots \land d q^{n}] + \dots + [d q^{1} \land \dots \land (L_{X} d q^{n})],

L_{X} Ω = [(L_{X} d q^{1}) \land \dots \land d q^{n}] + \dots + [d q^{1} \land \dots \land (L_{X} d q^{n})],

L_{X} Ω = - (a = 1 \sum n \overset{κ}{ˇ}^{a}) Ω,

L_{X} Ω = - (a = 1 \sum n \overset{κ}{ˇ}^{a}) Ω,

κ_{Ψ}^{H} = - ⋆_{Ψ}^{- 1} d ⋆_{Ψ} d H .

κ_{Ψ}^{H} = - ⋆_{Ψ}^{- 1} d ⋆_{Ψ} d H .

_{X} ⋆ 1 = ⋆ X,

_{X} ⋆ 1 = ⋆ X,

κ_{Ψ}^{H} = ⋆_{Ψ}^{- 1} (κ_{Ψ}^{H} ⋆_{Ψ} 1) = ⋆_{Ψ}^{- 1} (L_{X_{H}} ⋆_{Ψ} 1) = ⋆_{Ψ}^{- 1} (d _{X_{H}} ⋆_{Ψ} 1) = ⋆_{Ψ}^{- 1} [d (- ⋆_{Ψ} d H)] = - ⋆_{Ψ}^{- 1} d ⋆_{Ψ} d H .

κ_{Ψ}^{H} = ⋆_{Ψ}^{- 1} (κ_{Ψ}^{H} ⋆_{Ψ} 1) = ⋆_{Ψ}^{- 1} (L_{X_{H}} ⋆_{Ψ} 1) = ⋆_{Ψ}^{- 1} (d _{X_{H}} ⋆_{Ψ} 1) = ⋆_{Ψ}^{- 1} [d (- ⋆_{Ψ} d H)] = - ⋆_{Ψ}^{- 1} d ⋆_{Ψ} d H .

d X_{H} = 0.

d X_{H} = 0.

\int_{M} κ_{Ψ}^{H} ⋆_{Ψ} 1 = 0.

\int_{M} κ_{Ψ}^{H} ⋆_{Ψ} 1 = 0.

\frac{d H}{d t} = X_{H} H = d H (X_{H}) = - g^{Ψ} (X_{H}, X_{H}) < 0,

\frac{d H}{d t} = X_{H} H = d H (X_{H}) = - g^{Ψ} (X_{H}, X_{H}) < 0,

X_{H} = a \sum \dot{V}_{a} \frac{\partial}{\partial V _{a}},

X_{H} = a \sum \dot{V}_{a} \frac{\partial}{\partial V _{a}},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference

Full text

Hessian-information geometric

formulation of a class of deterministic neural network models

Shin-itiro GOTO,

The Institute of Statistical Mathematics,

10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan

Abstract

In this paper a class of dynamical systems describing deterministic neural network models are formulated from a viewpoint of differential geometry. This class includes the Hopfield model and gradient systems, and is such that the so-called activation functions induce information and Hessian geometries. In this formulation, it is shown that the phase space compressibility of a dynamical system belonging to this class is written in terms of the Laplace operator defined on Hessian manifolds, where phase space compressibility is associated with a volume-form of a manifold, and expresses how such a volume-form is compressed along the vector field of a dynamical system. Since the sigmoid function, as an activation function, plays a role in the study of neural network models, such compressibility is explicitly calculated for this case. Throughout this paper, the so-called dual coordinates known in information geometry are explicitly used.

1 Introduction

Neural network models play fundamental roles in brain science to clarify functions of human brains [1, 2]. They can be viewed as a class of dynamical systems that could be either probabilistic or deterministic, and can mathematically be studied. Considerable activity is being devoted to the comprehension of dynamics of genuine neural network models and their variants. Although models involving probability seem to be natural, analysis of deterministic models is expected to provide simple perspective due to simplicity of the models. Aside from its purely academic interest, its resolution has implications in mathematical engineering. In particular neural network models are intensively employed in machine learning systems [3].

Various approaches exist to analyze dynamical systems. One of such is to employ differential geometry [4, 5], since various geometric notions can be introduced and can systematically be applied to dynamical systems. Among them, one approach is to apply information geometry, where information geometry is a geometrization of statistics for parametric models with which cumulant generating functions are associated [6, 7]. One of essential roles of information geometry is to bridge convex analysis and Riemannian geometry, where tools in convex analysis includes the Legendre transform. Hessian geometry is such a differential geometry involving convex functions and it is not necessary to involve probability distribution functions [8]. Thus, if systems involving convex functions are nothing to do with probability, then it is expected that Hessian geometry is a key to develop geometrization for those systems. Such examples of geometrization include Hamiltonian dynamical systems and electric circuit models [9, 10].

For deterministic neural network models involving convex functions, it should be expected that convex analysis and its related geometries, Hessian and information geometries, play roles. To see this explicitly, a class of dynamical systems should be focused. A candidate of such a class is the one proposed by Cohen and Grossberg [11]. This class includes the Hopfield model [12], and this class has been intensively studied in the literature since it has Lyapunov functions [13]. On the other hand one of the keys in the study of neural network models is to choose the so-called activation function. However the use of convex analysis and its differential geometry based on activation functions of this model have not been focused. Thus a formulation based on Hessian and information geometries in conformity with a given convex activation function is expected to prove fruitful.

In this paper a class of dynamical systems including the Hopfield model for describing a neural network model is formulated in terms of Hessian and information geometries. In this formulation a coordinate free description of the class of dynamical systems is given, from which the quantity called phase space compressibility is shown to be written as the negative of the Laplace operator acting on a Lyapunov function. This quantity expresses how much a volume-form is compressed along the vector field associated with a dynamical system, and is a measure for how fast flow in phase space of the dynamical system converges to an attracting set.

This paper is organized as follows. In Section 2, a class of dynamical systems, which we call the generalized Hopfield model, is introduced. After a coordinate free description of the class is given, phase space compressibility is calculated. In Section 3, explicit expressions of phase space compressibility for some examples are given. Finally, Section 4 summarizes this paper and discusses some future works.

In this paper mathematical objects are assumed to be smooth and real.

2 Generalized Hopfield model

In this section a generalized Hopfield model is introduced, and then its geometric formulation is given.

Let ${\cal M}$ be an $n$ -dimensional manifold, $U$ a set of local coordinates on ${\cal M}$ with $U=\{U^{\,1},\ldots,U^{\,n}\}$ , and $\Psi$ a strictly convex function on ${\cal M}$ . Strictly convexity of $\Psi$ is written as $(\partial^{2}\psi/\partial\,U^{\,a}\partial U^{\,b})\succ 0$ in some convex domain of ${\cal M}$ , where $(A_{\,ab})\succ 0$ denotes that the matrix $(A_{\,ab})$ is positive definite. Introduce $V$ with $V=\{V_{\,1},\ldots,V_{\,n}\}$ such that

[TABLE]

Consider the dynamical system of the form

[TABLE]

where $t\in I\subseteq\mathbb{R}$ plays a role of time, and ${\cal H}$ is a Lyapunov function:

[TABLE]

This system is termed in this paper as follows.

Definition 2.1.

The dynamical system (2) together with (1) and (3) is referred to as the generalized Hopfield model.

The following are some examples.

Example 1.

Choosing ${\cal H}$ to be ${\cal H}(V)=\sum_{a}\sum_{b}\delta^{\,ab}\,V_{\,a}V_{\,b}/2$ with $\delta^{\,ab}$ being the Kronecker delta giving unity for $a=b$ otherwise vanishes, one has a gradient system:

[TABLE]

Example 2.

A neural network model having a Lyapunov function found in the literature is of the form of the set of differential equations

[TABLE]

where $\{R_{\,a}\}$ and $\{I_{\,a}^{\,\mathrm{ext}}\}$ are sets of constants, $\{J^{\,ab}\}$ constants satisfying $J^{\,ba}=J^{\,ab}$ , and $\psi$ is a strictly convex function ( See Refs. [12, 11] ). Recall that the sum of strictly convex functions is also a strictly convex function. Note that all the constants $\{C_{\,i}\}$ appeared in Ref. [12] have been set to unity in this paper. A Lyapunov function is known to exist in this model. Choose ${\cal H}$ to be

[TABLE]

where we have assumed that $\Upsilon^{-1}$ exists. Then it follows from

[TABLE]

that

[TABLE]

Thus the generalized Hopfield model includes the Hopfield model.

2.1 Geometric description

A geometric description of the generalized Hopfield model is described in this subsection. First, how the strictly convex function $\Psi$ induces a Hessian manifold is shown. Hessian manifold is a triplet $({\cal M},\nabla^{\,\Psi},g^{\,\Psi})$ , where $\nabla^{\,\Psi}$ is a flat connection, and $g^{\,\Psi}$ the Riemannian metric tensor field satisfying $g^{\,\Psi}=\nabla^{\,\Psi}\mathrm{d}\Psi$ . The connection is such that $\nabla_{\partial/\partial U^{\,a}}^{\,\Psi}(\partial/\partial U^{\,b})=0$ for a local coordinate set $U$ .

Given $\Psi$ , define the Riemannian metric tensor field as

[TABLE]

As in (1), the function $\Psi$ of $U$ induces $V$ .

With (1) and (5), one has

[TABLE]

which form the matrix $(g_{\,ab}^{\,\Psi})$ . The inverse matrix of $(g_{\,ab}^{\,\Psi})$ , denoted by $(g_{\,\Psi}^{\,ab})$ , is known to be written as [6]

[TABLE]

where $\Psi^{\,*}$ is the total Legendre transform of $\Psi$ :

[TABLE]

Combining these, one has

[TABLE]

The coordinates $U$ and $V$ are dual in the sense of information geometry [6] :

[TABLE]

From $({\cal M},\nabla^{\,\Psi},g^{\,\Psi})$ , one can uniquely introduce another connection denoted $\nabla^{\,\Psi\,*}$ that satisfies

[TABLE]

for all vector fields $X,Y$ and $Z$ . Then it turns out that the connection $\nabla^{\,\Psi\,*}$ is such that $\nabla_{\partial/\partial V_{\,a}}^{\,\Psi\,*}{\partial/\partial V_{\,b}}=0$ . A quadruplet $({\cal M},g^{\,\Psi},\nabla^{\,\Psi},\nabla^{\,\Psi\,*})$ is referred to as a dually flat space [6]. Since $({\cal M},\nabla^{\,\Psi},g^{\,\Psi})$ is a type of Riemannian manifold $({\cal M},g^{\,\Psi})$ induced from $\Psi$ , a canonical volume-form is defined. To emphasize how this volume-form is induced, this volume-form is denoted $\star_{\,\Psi}1$ in this paper. Associated with $\star_{\,\Psi}1$ , the Hodge map $\star_{\,\Psi}:{\Gamma\Lambda^{{p}}\,\cal{M}}\to{\Gamma\Lambda^{{n-p}}\,\cal{M}}$ is defined, where ${\Gamma\Lambda^{{p}}\,\cal{M}}$ is the space of $p$ -forms on ${\cal M}$ .

An expression for the generalized Hopfield model is written in terms of $g^{\,\Psi}$ as follows.

Lemma 2.1.

The generalized Hopfield model is written as the components of a vector field $X_{\,{\cal H}}$ on the Riemannian manifold $({\cal M},g^{\,\Psi})$ satisfying

[TABLE]

where $g^{\,\Psi}$ has been defined in (5).

Proof.

Write $X_{\,{\cal H}}$ in terms of $\{U^{\,a}\}$ as

[TABLE]

Then, substituting

[TABLE]

into (7), using the property of the pairing $\mathrm{d}U^{\,b}(\partial/\partial U_{\,a})=\delta^{\,ab}$ , and $g_{\,ab}^{\,\Psi}=g_{\,ba}^{\,\Psi}$ , one has

[TABLE]

Since $g^{\,\Psi}$ is non-degenerate, one has

[TABLE]

∎

From this Lemma, a generalized Hopfield model is a triplet $({\cal M},\Psi,{\cal H})$ in this geometric setting, and can be viewed as a dynamical system on the dually flat space $({\cal M},g^{\,\Psi},\nabla^{\,\Psi},\nabla^{\,\Psi\,*})$ . In the literature, several dynamical systems have been studied in dually flat spaces [14, 15, 9]. In the so-called statistical manifolds, which are manifolds generalized from dually flat spaces, several dynamical systems theories have been considered [17, 18, 19].

This Lemma will be used to calculate the phase space compressibility. This quantity is associated with a volume-form, and expresses how such a volume-form is compressed along a vector field associated with a dynamical system. If there is an attractor in phase space for a dynamical system, then roughly speaking, phase space compressibility is a measure for expressing how fast flow of a dynamical system converges to an attracting set in the phase space. This is defined as follows:

Definition 2.2.

( Phase space compressibility [16] ) : Let ${\cal M}$ be an $n$ -dimensional manifold, $X$ a vector field on ${\cal M}$ , and $\Omega$ a non-vanishing $n$ -form. Introduce the one-form $\kappa_{\,\Omega}$ such that ${\cal L}_{X}\Omega=\kappa_{\Omega}(X)\Omega$ , where ${\cal L}_{X}$ is the Lie derivative along a vector field $X$ . Then $\kappa_{\,\Omega}(X)$ is referred to as a phase space compressibility.

Examples of phase space compressibility are as follows.

(Vector field associated with a linear dynamical system): Consider the vector field $X$ on $\mathbb{R}^{\,n}$ associated with the linear dynamical system,

[TABLE]

where $\check{\kappa}^{\,a}>0$ is constant for each $a$ . Choose $\Omega=\mathrm{d}q^{\,1}\wedge\cdots\wedge\mathrm{d}q^{\,n}$ as a non-vanishing $n$ -form. It follows from

[TABLE]

and ${\cal L}_{X}\mathrm{d}q^{\,a}=\mathrm{d}(Xq^{\,a})=\mathrm{d}\dot{q}^{\,a}=-\check{\kappa}^{\,a}\mathrm{d}q^{\,a}$ that

[TABLE]

from which $\kappa_{\Omega}(X)=-\,\sum_{a=1}^{n}\check{\kappa}^{\,a}<0$ . 2. 2.

( Hamiltonian vector field ) : Let $({\cal S},\omega)$ be an $2n$ -dimensional symplectic manifold, $(q,p)$ a Darboux coordinate system so that $\omega=\sum_{a=1}^{n}\mathrm{d}p_{\,a}\wedge\mathrm{d}q^{\,a}$ with $q=\{q_{\,1},\ldots,q_{\,n}\}$ and $p=\{p^{\,1},\ldots,p^{\,n}\}$ , and $H$ a Hamiltonian function. The Hamiltonian vector field $X_{\,H}$ is the vector field satisfying $\imath_{X_{\,H}}\omega=-\,\mathrm{d}H$ . It then follows that ${\cal L}_{X_{\,H}}\omega=0$ . Choose $\Omega=\omega\wedge\cdots\wedge\omega$ as a non-vanishing $2n$ -form on ${\cal S}$ . Since ${\cal L}_{X_{\,H}}\Omega=0$ , one concludes that $\kappa_{\Omega}(X_{\,H})=0$ .

Recall that on a Riemannian manifold $({\cal M},g)$ , the (Hodge) Laplace operator acting on a function is defined as $\star^{\,-1}\,\mathrm{d}\,\star\,\mathrm{d}:{\Gamma\Lambda^{{0}}\,\cal{M}}\to{\Gamma\Lambda^{{0}}\,\cal{M}}$ , where $\star$ is the Hodge operator and $\star^{\,-1}$ its inverse on ${\cal M}$ . Then, the main theorem in this paper is as follows.

Theorem 2.1.

( Phase space compressibility for the generalized Hopfield model ) : Let $\star_{\,\Psi}1$ be a canonical volume-form on $({\cal M},g^{\,\Psi})$ , $\star_{\,\Psi}$ the Hodge map, $\star_{\,\Psi}^{\,-1}$ its inverse map, and $X_{\,{\cal H}}$ a vector field satisfying (7). Then, the phase space compressibility $\kappa_{\,\Psi}^{\,{\cal H}}:=\kappa_{\,\star_{\,\Psi}1}(X_{\,{\cal H}})$ is given by the negative of the Laplace operator acting on the function ${\cal H}$ ,

[TABLE]

Proof.

Introduce the notation $\widetilde{X}_{\,{\cal H}}=g^{\,\Psi}(X_{\,{\cal H}},-)$ , which is the metric dual of $X_{\,{\cal H}}$ . To proceed, we use the formula

[TABLE]

for any vector field $X$ on a Riemannian manifold, where $\star$ is a Hodge map and $\imath_{\,X}$ the interior product operator with a vector field $X$ . Then, it follows from (7) and the formula that $\imath_{X_{\,{\cal H}}}\star_{\,\Psi}1=\star_{\,\Psi}\,\widetilde{X}_{\,{\cal H}}=-\star_{\,\Psi}\mathrm{d}{\cal H}$ . Also, from definition of $\kappa_{\,\Psi}^{\,{\cal H}}$ , it follows that $\kappa_{\,\Psi}^{\,{\cal H}}\star_{\,\Psi}1={\cal L}_{\,X_{\,{\cal H}}}\star_{\,\Psi}1$ .

With these and the Cartan formula ${\cal L}_{X}\beta=(\mathrm{d}\imath_{X}+\imath_{X}\mathrm{d})\beta$ for any $p$ -form $\beta$ with $0\leq p\leq\dim{\cal M}$ , one has that

[TABLE]

∎

There are some consequences from Lemma 2.1 and Theorem 2.1.

•

It follows from (7) that

[TABLE]

•

For the case that $\partial{\cal M}=\emptyset$ , it follows from the Stokes theorem that

[TABLE]

•

The existence of Lyapunov function can be expressed in terms of the metric tensor field as

[TABLE]

for a non-vanishing vector field $X_{\,{\cal H}}$ .

•

In terms of the coordinate set $V$ ,

[TABLE]

one has

[TABLE]

The phase space compressibility $\kappa_{\,\Psi}^{{\cal H}}$ is also related to the co-derivative and the (Hodge) Laplace operator acting on the one-form $\widetilde{X}_{\,{\cal H}}$ . To state these explicitly, recall that the co-derivative acting on a $p$ -form $\mathrm{d}_{\,\Psi}^{\,\dagger}:{\Gamma\Lambda^{{p}}\,\cal{M}}\to{\Gamma\Lambda^{{p-1}}\,\cal{M}}$ is defined as $\mathrm{d}_{\,\Psi}^{\,\dagger}=\star^{-1}\,\mathrm{d}\,\star$ , and that the Laplace operator acting on a $p$ -from on ${\cal M}$ is defined as $\mathrm{d}\,\mathrm{d}^{\,\dagger}+\mathrm{d}^{\,\dagger}\,\mathrm{d}:{\Gamma\Lambda^{{p}}\,\cal{M}}\to{\Gamma\Lambda^{{p}}\,\cal{M}}$ , ( See for example, Ref. [20] ). Then, one has the following.

Proposition 2.1.

[TABLE]

where $\mathrm{d}_{\,\Psi}^{\,\dagger}:{\Gamma\Lambda^{{p}}\,\cal{M}}\to{\Gamma\Lambda^{{p-1}}\,\cal{M}}$ is the co-derivative, $\mathrm{d}_{\,\Psi}^{\,\dagger}:=\star_{\Psi}^{-1}\mathrm{d}\star_{\,\Psi}$ .

Proof.

A proof is completed by straightforward calculations. For the first equality, it follows from $\widetilde{X}_{\,{\cal H}}=-\,\mathrm{d}{\cal H}$ that

[TABLE]

The most right hand side of the equation above is written in terms of $\kappa_{\,\Psi}^{\,{\cal H}}$ due to Theorem 2.1 :

[TABLE]

For the second equality it follows from (8) and the proven first equality that

[TABLE]

∎

Before closing this subsection, it is argued how this geometric formulation is applied to other dynamical systems. It has been known that the Hopfield model belongs a class of dynamical systems proposed in Ref. [11] ( See also [13] ). Consider the dynamical system of the form,

[TABLE]

where

•

$A^{\,a}$ is a positive function of $U^{\,a}$ , $A^{\,a}=A^{\,a}(U^{\,a})$ ,

•

$B^{\,a}$ is a function of $U^{\,a}$ , $B^{\,a}=B^{\,a}(U^{\,a})$ ,

•

$C^{\,ab}$ forms a symmetric constant matrix, $C^{\,ab}=C^{\,ba}$ ,

•

$\psi$ is a strictly convex function depending on one variable, so that $\mathrm{d}^{\,2}\psi/\mathrm{d}\xi^{\,2}>0$ .

The class of the form (9) is obtained by restricting the class proposed in Ref. [11]. The Lyapunov function ${\cal H}^{\,\prime}$ was found as

[TABLE]

From this ${\cal H}^{\,\prime}$ , one has

[TABLE]

from which

[TABLE]

Then, (9) can be written as

[TABLE]

due to

[TABLE]

Notice that (10) is not written as (3), unless $A^{\,a}$ is constant.

2.2 Coordinate expression of phase space compressibility

In this subsection the coordinate expression of $\kappa_{\,\Psi}^{\,{\cal H}}$ is given. In particular the case

[TABLE]

is focused.

Recall that the coordinate expression of the Laplace operator on a Riemannian manifold $({\cal N},g)$ with $g$ being a Riemannian metric tensor field

[TABLE]

The Laplacian operator acting on a function $f$ is then

[TABLE]

where $\star$ is the Hodge map associated with $g$ , $|g|$ the determinant of the matrix $(g_{\,ab})$ , and $(g^{\,ab})$ the inverse matrix of $(g_{\,ab})$ .

A coordinate expression of $\kappa_{\,\Psi}^{\,{\cal H}}$ is calculated by applying (12) to Theorem2.1. Each term in (12) is calculated as follows. Since

[TABLE]

one has

[TABLE]

and

[TABLE]

Combining these terms, one can write

[TABLE]

Since ${\cal H}={\cal H}(V)$ , one has

[TABLE]

and

[TABLE]

Thus, one arrives at

[TABLE]

3 Examples

In this section, after introducing the sigmoid function, the two examples are focused.

As an activation function, (11) is often focused in the literature due to simplicity,

[TABLE]

It follows that the matrix $(g_{\,ab}^{\,\Psi})$ is diagonal.

In what follows geometric objects and the phase space compressibility are calculated for this case.

3.1 Sigmoid function

Choose the function $\psi$ as

[TABLE]

This is referred to as the soft plus function, and this choice leads to the sigmoid function as shown below. From this $\psi$ , one can derive various quantities. First introduce

[TABLE]

where the right hand side of the equation above is referred to as the sigmoid function. Then,

[TABLE]

and

[TABLE]

The Legendre transform of $\psi$ ,

[TABLE]

is calculated as follows. Solving

[TABLE]

for $x$ , one has

[TABLE]

Then,

[TABLE]

from which

[TABLE]

The components of the metric tensor field are obtained as

[TABLE]

3.2 Example 1

The case of Example 1 in Section 2 is considered here, where ${\cal H}(V)=\sum_{a}\sum_{b}\delta^{\,ab}\,V_{\,a}V_{\,b}/2$ has been chosen.

Substituting this ${\cal H}$ and

[TABLE]

into (13), one has

[TABLE]

If $\psi$ is the sigmoid function, then

[TABLE]

Notice that

[TABLE]

with $n=\sum_{a}1$ .

3.3 Example

2

The case of Example 2 is considered here.

Substituting the differentiation of (4),

[TABLE]

into (13), one has

[TABLE]

with $V_{\,a}=\mathrm{d}\psi/\mathrm{d}U^{\,a}$ .

If $\psi$ is the sigmoid function, then

[TABLE]

This further reduces by the use of

[TABLE]

to

[TABLE]

For the steady state, $\dot{U}^{\,a}=0$ for all $a$ , the term $\{\cdots\}$ above vanishes. In the steady state, where the self-coupling terms vanish, $J^{\,aa}=0$ for all $a$ , one has

[TABLE]

4 Conclusions

This paper has offered a viewpoint that a class of dynamical systems modeling deterministic neural networks can be described in terms of Hessian and information geometries. In this formulation the phase compressibility is shown to be equal to the negative of Laplace operator acting on a Lyapunov function. Also some explicit forms of them have been shown.

There are some potential future works that follow from this paper. One is to study a class of stochastic dynamical systems, since neural networks are often modeled by stochastic models. Then it is of interest to see if the present approach can be applied to such stochastic models. Another one is to consider a relation between the present formulation on a dually flat space and contact Hamiltonian systems in a contact manifold [21, 22]. Since the generalized Hopfield model $({\cal M},\Psi,{\cal H})$ is similar to a Hamiltonian system $({\cal S},\omega,H)$ with ${\cal S}$ some even-dimensional manifold, $\omega$ a symplectic $2$ -form, and $H$ a Hamiltonian function [5], one may explore relations between them.

We believe that the elucidation of these remaining questions together with the present study will develop geometric theory of neural network models, and that of dynamical systems.

Acknowledgments

The author is grateful to Hideitsu Hino for support for this research. Also the author is partially supported by JSPS (KAKENHI) grant number 19K03635 and by JST CREST JPMJCR1761.

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] T. Geszti, Physical models of Neural networks , World Scientific, (1990).
2[2] D.J. Amit, Modeling Brain Function , Cambridge University Press, (1989).
3[3] C.M. Bishop, Pattern recognition and machine learning , Springer, (2006).
4[4] V.I. Arnold, Mathematical Methods of Classical Mechanics , 2nd Ed., Springer, (1997).
5[5] A.C. da Silva, Lectures on Symplectic Geometry , 2nd Ed., Springer, (2008).
6[6] S. Amari and H. Nagaoka, Methods of information geometry , AMS, Oxford University Press, (2000).
7[7] N. Ay et al, Information geometry , Springer, (2017).
8[8] H. Shima, The Geometry of Hessian Structures , Springer, (2007).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Hessian-information geometric

Abstract

1 Introduction

2 Generalized Hopfield model

Definition 2.1**.**

Example 1**.**

Example 2**.**

2.1 Geometric description

Lemma 2.1**.**

Proof**.**

Definition 2.2**.**

Theorem 2.1**.**

Proof**.**

Proposition 2.1**.**

Proof.

2.2 Coordinate expression of phase space compressibility

3 Examples

3.1 Sigmoid function

3.2 Example 1

3.3 Example

4 Conclusions

Acknowledgments

Definition 2.1.

Example 1.

Example 2.

Lemma 2.1.

Proof.

Definition 2.2.

Theorem 2.1.

Proof.

Proposition 2.1.