Hessian-information geometric formulation of a class of deterministic neural network models
Shin-itiro Goto

TL;DR
This paper introduces a geometric framework for deterministic neural network models using Hessian and information geometry, linking phase space properties to differential operators on manifolds, with explicit calculations for sigmoid activations.
Contribution
It formulates neural network dynamics within a Hessian geometric framework, connecting phase space compressibility to Laplace operators on Hessian manifolds, and explicitly analyzes sigmoid functions.
Findings
Phase space compressibility expressed via Laplace operator on Hessian manifolds.
Explicit calculation of compressibility for sigmoid activation functions.
Utilization of dual coordinates in information geometry for neural network analysis.
Abstract
In this paper a class of dynamical systems describing deterministic neural network models are formulated from a viewpoint of differential geometry. This class includes the Hopfield model and gradient systems, and is such that the so-called activation functions induce information and Hessian geometries. In this formulation, it is shown that the phase space compressibility of a dynamical system belonging to this class is written in terms of the Laplace operator defined on Hessian manifolds, where phase space compressibility is associated with a volume-form of a manifold, and expresses how such a volume-form is compressed along the vector field of a dynamical system. Since the sigmoid function, as an activation function, plays a role in the study of neural network models, such compressibility is explicitly calculated for this case. Throughout this paper, the so-called dual coordinates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference
Hessian-information geometric
formulation of a class of deterministic neural network models
Shin-itiro GOTO,
The Institute of Statistical Mathematics,
10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan
Abstract
In this paper a class of dynamical systems describing deterministic neural network models are formulated from a viewpoint of differential geometry. This class includes the Hopfield model and gradient systems, and is such that the so-called activation functions induce information and Hessian geometries. In this formulation, it is shown that the phase space compressibility of a dynamical system belonging to this class is written in terms of the Laplace operator defined on Hessian manifolds, where phase space compressibility is associated with a volume-form of a manifold, and expresses how such a volume-form is compressed along the vector field of a dynamical system. Since the sigmoid function, as an activation function, plays a role in the study of neural network models, such compressibility is explicitly calculated for this case. Throughout this paper, the so-called dual coordinates known in information geometry are explicitly used.
1 Introduction
Neural network models play fundamental roles in brain science to clarify functions of human brains [1, 2]. They can be viewed as a class of dynamical systems that could be either probabilistic or deterministic, and can mathematically be studied. Considerable activity is being devoted to the comprehension of dynamics of genuine neural network models and their variants. Although models involving probability seem to be natural, analysis of deterministic models is expected to provide simple perspective due to simplicity of the models. Aside from its purely academic interest, its resolution has implications in mathematical engineering. In particular neural network models are intensively employed in machine learning systems [3].
Various approaches exist to analyze dynamical systems. One of such is to employ differential geometry [4, 5], since various geometric notions can be introduced and can systematically be applied to dynamical systems. Among them, one approach is to apply information geometry, where information geometry is a geometrization of statistics for parametric models with which cumulant generating functions are associated [6, 7]. One of essential roles of information geometry is to bridge convex analysis and Riemannian geometry, where tools in convex analysis includes the Legendre transform. Hessian geometry is such a differential geometry involving convex functions and it is not necessary to involve probability distribution functions [8]. Thus, if systems involving convex functions are nothing to do with probability, then it is expected that Hessian geometry is a key to develop geometrization for those systems. Such examples of geometrization include Hamiltonian dynamical systems and electric circuit models [9, 10].
For deterministic neural network models involving convex functions, it should be expected that convex analysis and its related geometries, Hessian and information geometries, play roles. To see this explicitly, a class of dynamical systems should be focused. A candidate of such a class is the one proposed by Cohen and Grossberg [11]. This class includes the Hopfield model [12], and this class has been intensively studied in the literature since it has Lyapunov functions [13]. On the other hand one of the keys in the study of neural network models is to choose the so-called activation function. However the use of convex analysis and its differential geometry based on activation functions of this model have not been focused. Thus a formulation based on Hessian and information geometries in conformity with a given convex activation function is expected to prove fruitful.
In this paper a class of dynamical systems including the Hopfield model for describing a neural network model is formulated in terms of Hessian and information geometries. In this formulation a coordinate free description of the class of dynamical systems is given, from which the quantity called phase space compressibility is shown to be written as the negative of the Laplace operator acting on a Lyapunov function. This quantity expresses how much a volume-form is compressed along the vector field associated with a dynamical system, and is a measure for how fast flow in phase space of the dynamical system converges to an attracting set.
This paper is organized as follows. In Section 2, a class of dynamical systems, which we call the generalized Hopfield model, is introduced. After a coordinate free description of the class is given, phase space compressibility is calculated. In Section 3, explicit expressions of phase space compressibility for some examples are given. Finally, Section 4 summarizes this paper and discusses some future works.
In this paper mathematical objects are assumed to be smooth and real.
2 Generalized Hopfield model
In this section a generalized Hopfield model is introduced, and then its geometric formulation is given.
Let be an -dimensional manifold, a set of local coordinates on with , and a strictly convex function on . Strictly convexity of is written as in some convex domain of , where denotes that the matrix is positive definite. Introduce with such that
[TABLE]
Consider the dynamical system of the form
[TABLE]
where plays a role of time, and is a Lyapunov function:
[TABLE]
This system is termed in this paper as follows.
Definition 2.1**.**
The dynamical system (2) together with (1) and (3) is referred to as the generalized Hopfield model.
The following are some examples.
Example 1**.**
Choosing to be with being the Kronecker delta giving unity for otherwise vanishes, one has a gradient system:
[TABLE]
Example 2**.**
A neural network model having a Lyapunov function found in the literature is of the form of the set of differential equations
[TABLE]
where and are sets of constants, constants satisfying , and is a strictly convex function ( See Refs. [12, 11] ). Recall that the sum of strictly convex functions is also a strictly convex function. Note that all the constants appeared in Ref. [12] have been set to unity in this paper. A Lyapunov function is known to exist in this model. Choose to be
[TABLE]
where we have assumed that exists. Then it follows from
[TABLE]
that
[TABLE]
Thus the generalized Hopfield model includes the Hopfield model.
2.1 Geometric description
A geometric description of the generalized Hopfield model is described in this subsection. First, how the strictly convex function induces a Hessian manifold is shown. Hessian manifold is a triplet , where is a flat connection, and the Riemannian metric tensor field satisfying . The connection is such that for a local coordinate set .
Given , define the Riemannian metric tensor field as
[TABLE]
As in (1), the function of induces .
[TABLE]
which form the matrix . The inverse matrix of , denoted by , is known to be written as [6]
[TABLE]
where is the total Legendre transform of :
[TABLE]
Combining these, one has
[TABLE]
The coordinates and are dual in the sense of information geometry [6] :
[TABLE]
From , one can uniquely introduce another connection denoted that satisfies
[TABLE]
for all vector fields and . Then it turns out that the connection is such that . A quadruplet is referred to as a dually flat space [6]. Since is a type of Riemannian manifold induced from , a canonical volume-form is defined. To emphasize how this volume-form is induced, this volume-form is denoted in this paper. Associated with , the Hodge map is defined, where is the space of -forms on .
An expression for the generalized Hopfield model is written in terms of as follows.
Lemma 2.1**.**
The generalized Hopfield model is written as the components of a vector field on the Riemannian manifold satisfying
[TABLE]
where has been defined in (5).
Proof**.**
Write in terms of as
[TABLE]
Then, substituting
[TABLE]
into (7), using the property of the pairing , and , one has
[TABLE]
Since is non-degenerate, one has
[TABLE]
∎
From this Lemma, a generalized Hopfield model is a triplet in this geometric setting, and can be viewed as a dynamical system on the dually flat space . In the literature, several dynamical systems have been studied in dually flat spaces [14, 15, 9]. In the so-called statistical manifolds, which are manifolds generalized from dually flat spaces, several dynamical systems theories have been considered [17, 18, 19].
This Lemma will be used to calculate the phase space compressibility. This quantity is associated with a volume-form, and expresses how such a volume-form is compressed along a vector field associated with a dynamical system. If there is an attractor in phase space for a dynamical system, then roughly speaking, phase space compressibility is a measure for expressing how fast flow of a dynamical system converges to an attracting set in the phase space. This is defined as follows:
Definition 2.2**.**
( Phase space compressibility [16] ) : Let be an -dimensional manifold, a vector field on , and a non-vanishing -form. Introduce the one-form such that , where is the Lie derivative along a vector field . Then is referred to as a phase space compressibility.
Examples of phase space compressibility are as follows.
(Vector field associated with a linear dynamical system): Consider the vector field on associated with the linear dynamical system,
[TABLE]
where is constant for each . Choose as a non-vanishing -form. It follows from
[TABLE]
and that
[TABLE]
from which . 2. 2.
( Hamiltonian vector field ) : Let be an -dimensional symplectic manifold, a Darboux coordinate system so that with and , and a Hamiltonian function. The Hamiltonian vector field is the vector field satisfying . It then follows that . Choose as a non-vanishing -form on . Since , one concludes that .
Recall that on a Riemannian manifold , the (Hodge) Laplace operator acting on a function is defined as , where is the Hodge operator and its inverse on . Then, the main theorem in this paper is as follows.
Theorem 2.1**.**
( Phase space compressibility for the generalized Hopfield model ) : Let be a canonical volume-form on , the Hodge map, its inverse map, and a vector field satisfying (7). Then, the phase space compressibility is given by the negative of the Laplace operator acting on the function ,
[TABLE]
Proof**.**
Introduce the notation , which is the metric dual of . To proceed, we use the formula
[TABLE]
for any vector field on a Riemannian manifold, where is a Hodge map and the interior product operator with a vector field . Then, it follows from (7) and the formula that . Also, from definition of , it follows that .
With these and the Cartan formula for any -form with , one has that
[TABLE]
∎
There are some consequences from Lemma 2.1 and Theorem 2.1.
- •
It follows from (7) that
[TABLE]
- •
For the case that , it follows from the Stokes theorem that
[TABLE]
- •
The existence of Lyapunov function can be expressed in terms of the metric tensor field as
[TABLE]
for a non-vanishing vector field .
- •
In terms of the coordinate set ,
[TABLE]
one has
[TABLE]
The phase space compressibility is also related to the co-derivative and the (Hodge) Laplace operator acting on the one-form . To state these explicitly, recall that the co-derivative acting on a -form is defined as , and that the Laplace operator acting on a -from on is defined as , ( See for example, Ref. [20] ). Then, one has the following.
Proposition 2.1**.**
[TABLE]
where is the co-derivative, .
Proof.
A proof is completed by straightforward calculations. For the first equality, it follows from that
[TABLE]
The most right hand side of the equation above is written in terms of due to Theorem 2.1 :
[TABLE]
For the second equality it follows from (8) and the proven first equality that
[TABLE]
∎
Before closing this subsection, it is argued how this geometric formulation is applied to other dynamical systems. It has been known that the Hopfield model belongs a class of dynamical systems proposed in Ref. [11] ( See also [13] ). Consider the dynamical system of the form,
[TABLE]
where
- •
is a positive function of , ,
- •
is a function of , ,
- •
forms a symmetric constant matrix, ,
- •
is a strictly convex function depending on one variable, so that .
The class of the form (9) is obtained by restricting the class proposed in Ref. [11]. The Lyapunov function was found as
[TABLE]
From this , one has
[TABLE]
from which
[TABLE]
Then, (9) can be written as
[TABLE]
due to
[TABLE]
Notice that (10) is not written as (3), unless is constant.
2.2 Coordinate expression of phase space compressibility
In this subsection the coordinate expression of is given. In particular the case
[TABLE]
is focused.
Recall that the coordinate expression of the Laplace operator on a Riemannian manifold with being a Riemannian metric tensor field
[TABLE]
The Laplacian operator acting on a function is then
[TABLE]
where is the Hodge map associated with , the determinant of the matrix , and the inverse matrix of .
A coordinate expression of is calculated by applying (12) to Theorem2.1. Each term in (12) is calculated as follows. Since
[TABLE]
one has
[TABLE]
and
[TABLE]
Combining these terms, one can write
[TABLE]
Since , one has
[TABLE]
and
[TABLE]
Thus, one arrives at
[TABLE]
3 Examples
In this section, after introducing the sigmoid function, the two examples are focused.
As an activation function, (11) is often focused in the literature due to simplicity,
[TABLE]
It follows that the matrix is diagonal.
In what follows geometric objects and the phase space compressibility are calculated for this case.
3.1 Sigmoid function
Choose the function as
[TABLE]
This is referred to as the soft plus function, and this choice leads to the sigmoid function as shown below. From this , one can derive various quantities. First introduce
[TABLE]
where the right hand side of the equation above is referred to as the sigmoid function. Then,
[TABLE]
and
[TABLE]
The Legendre transform of ,
[TABLE]
is calculated as follows. Solving
[TABLE]
for , one has
[TABLE]
Then,
[TABLE]
from which
[TABLE]
The components of the metric tensor field are obtained as
[TABLE]
3.2 Example 1
The case of Example 1 in Section 2 is considered here, where has been chosen.
Substituting this and
[TABLE]
into (13), one has
[TABLE]
If is the sigmoid function, then
[TABLE]
Notice that
[TABLE]
with .
3.3 Example
The case of Example 2 is considered here.
Substituting the differentiation of (4),
[TABLE]
into (13), one has
[TABLE]
with .
If is the sigmoid function, then
[TABLE]
This further reduces by the use of
[TABLE]
to
[TABLE]
For the steady state, for all , the term above vanishes. In the steady state, where the self-coupling terms vanish, for all , one has
[TABLE]
4 Conclusions
This paper has offered a viewpoint that a class of dynamical systems modeling deterministic neural networks can be described in terms of Hessian and information geometries. In this formulation the phase compressibility is shown to be equal to the negative of Laplace operator acting on a Lyapunov function. Also some explicit forms of them have been shown.
There are some potential future works that follow from this paper. One is to study a class of stochastic dynamical systems, since neural networks are often modeled by stochastic models. Then it is of interest to see if the present approach can be applied to such stochastic models. Another one is to consider a relation between the present formulation on a dually flat space and contact Hamiltonian systems in a contact manifold [21, 22]. Since the generalized Hopfield model is similar to a Hamiltonian system with some even-dimensional manifold, a symplectic -form, and a Hamiltonian function [5], one may explore relations between them.
We believe that the elucidation of these remaining questions together with the present study will develop geometric theory of neural network models, and that of dynamical systems.
Acknowledgments
The author is grateful to Hideitsu Hino for support for this research. Also the author is partially supported by JSPS (KAKENHI) grant number 19K03635 and by JST CREST JPMJCR1761.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] T. Geszti, Physical models of Neural networks , World Scientific, (1990).
- 2[2] D.J. Amit, Modeling Brain Function , Cambridge University Press, (1989).
- 3[3] C.M. Bishop, Pattern recognition and machine learning , Springer, (2006).
- 4[4] V.I. Arnold, Mathematical Methods of Classical Mechanics , 2nd Ed., Springer, (1997).
- 5[5] A.C. da Silva, Lectures on Symplectic Geometry , 2nd Ed., Springer, (2008).
- 6[6] S. Amari and H. Nagaoka, Methods of information geometry , AMS, Oxford University Press, (2000).
- 7[7] N. Ay et al, Information geometry , Springer, (2017).
- 8[8] H. Shima, The Geometry of Hessian Structures , Springer, (2007).
