TL;DR
This paper develops a mathematical framework for group equivariance in machine learning using topological and geometric tools, introducing GENEOs to improve data analysis and neural network initialization.
Contribution
It introduces group-equivariant non-expansive operators (GENEOs), analyzing their properties and demonstrating their application in metric learning and CNN initialization.
Findings
GENEOs form a compact and convex space under certain conditions.
Sampled GENEOs can effectively initialize CNN kernels.
The framework applies to datasets like MNIST and Fashion-MNIST.
Abstract
The aim of this paper is to provide a general mathematical framework for group equivariance in the machine learning context. The framework builds on a synergy between persistent homology and the theory of group actions. We define group-equivariant non-expansive operators (GENEOs), which are maps between function spaces associated with groups of transformations. We study the topological and metric properties of the space of GENEOs to evaluate their approximating power and set the basis for general strategies to initialise and compose operators. We begin by defining suitable pseudo-metrics for the function spaces, the equivariance groups, and the set of non-expansive operators. Basing on these pseudo-metrics, we prove that the space of GENEOs is compact and convex, under the assumption that the function spaces are compact and convex. These results provide fundamental guarantees in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Towards a topological-geometrical theory of group equivariant non-expansive operators for data analysis and machine learning
Mattia G. Bergomi
Champalimaud Research, Champalimaud Center for the Unknown - Lisbon, Portugal
Patrizio Frosini
Department of Mathematics, University of Bologna
Advanced Research Center on Electronic System “Ercole De Castro”, University of Bologna11footnotemark: 1
Daniela Giorgi
Italian National Research Council, Institute of Information Science and Technologies “Alessandro Faedo”
Nicola Quercioli
Abstract
The aim of this paper is to provide a general mathematical framework for group equivariance in the machine learning context. The framework builds on a synergy between persistent homology and the theory of group actions. We define group equivariant non-expansive operators (GENEOs), which are maps between function spaces associated with groups of transformations. We study the topological and metric properties of the space of GENEOs to evaluate their approximating power and set the basis for general strategies to initialise and compose operators. We begin by defining suitable pseudo-metrics for the function spaces, the equivariance groups, and the set of non-expansive operators. Basing on these pseudo-metrics, we prove that the space of GENEOs is compact and convex, under the assumption that the function spaces are compact and convex. These results provide fundamental guarantees in a machine learning perspective. We show examples on the MNIST and fashion-MNIST datasets. By considering isometry-equivariant non-expansive operators, we describe a simple strategy to select and sample operators, and show how the selected and sampled operators can be used to perform both classical metric learning and an effective initialisation of the kernels of a convolutional neural network.
keywords:
Group equivariant non-expansive operator, invariance group , group action , initial topology , persistent homology, persistence diagram, bottleneck distance, natural pseudo-distance, agent, perception pair, slice category, topological data analysis
MSC:
Primary 55N35 Secondary 47H09, 54H15, 57S10, 68U05, 65D18
1 Introduction
Deep learning-based algorithms reached human or superhuman performance in many real-world tasks. Beyond the extreme effectiveness of deep learning, one of the main reasons for its success is that raw data are sufficient—if not even more suitable than hand-crafted features—for these algorithms to learn a specific task. However, only few attempts have been made to create formal theories allowing for the creation of a controllable and interpretable framework, in which deep neural networks can be formally defined and studied. Furthermore, if learning directly from raw data allows one to outclass human feature engineering, the architectures of deep networks are growing more and more complex, and often are as task-specific as hand-crafted features used to be.
We aim at providing a general mathematical framework, where any agent capable of acting on a certain dataset (e.g. deep neural networks) can be formally described as a collection of operators acting on the data. To motivate our model, we assume that data cannot be studied directly, but only through the action of agents that measure and transform them. Consequently, our model stems from a functional viewpoint. By interpreting data as points of a function space, it is possible to learn and optimise operators defined on the data. In other words, we are interested in the space of transformations of the data, rather than the data themselves.
Albeit unformalised, this idea is not new in deep learning. For instance, one of the main features of convolutional neural networks [1] is the election of convolution as the operator of choice to act on the data. The convolutional kernels learned by optimising a loss function are operators that map an image to a new one that, for instance, is more easily classifiable. Moreover, convolutions are operators equivariant with respect to translations (at least in the ideal continuous case). We believe that the restriction to a specific family of operators and the equivariance with respect to interpretable transformations are key aspects of the success of this architecture. In our theory, operators are thought of as instruments allowing an agent to provide a measure of the world, as the kernels learned by a convolutional neural network allow a classifier to spot essential features to recognise objects belonging to the same category. Equivariance with respect to the action of a group (or a set) of transformations corresponds to the introduction of symmetries in the function space where data are represented. This allows us to both gain control on the nature of the learned operators, as well as drastically reduce the dimensionality of the space of operators to be explored during learning. Such a goal is in line with the recent interest for invariant representations in machine learning (cf., e.g., [2]).
We make use of topological data analysis to describe spaces of group equivariant non-expansive operators (GENEOs). GENEOs are maps between function spaces associated with groups of transformations. We study the topological and metric properties of the space of GENEOs to evaluate their approximating power and set the basis for general strategies to initialise, compose operators and eventually connect them hierarchically to form operator networks. Our first contribution is to define suitable pseudo-metrics for the function spaces, the equivariance groups, and the set of non-expansive operators. Basing on these pseudo-metrics, we prove that the space of GENEOs is compact and convex, under the assumption that the function spaces are compact and convex. These results provide fundamental and provable guarantees for the goodness of this operator-based approach in a machine learning perspective: Compactness, for instance, guarantees that any operator belonging to a certain space can be approximated by a finite number of operators sampled in the same space.
Our study of the space of GENEOs takes advantage of recent results in topological data analysis, in particular in the theory of persistent homology [3]. Our approach also generalises standard group equivariance to set equivariance, which seems much more suitable for the representation of intelligent agents.
To conclude, we validate our model with examples on the MNIST, fashion-MNIST and CIFAR10 datasets. These applications are aimed at proving the effectiveness on discrete examples, of the metrics defined and the theorems proved in the continuous case. By considering isometry equivariant non-expansive operators (IENEOs), we describe two simple algorithms allowing the selection and sampling of IENEOs based on few labelled samples taken from the dataset. We show how selected and sampled operators can be used to perform both classical metric learning and effective initialisation of the kernels of a convolutional neural network.
Our main contribution is a general framework to previous works on group equivariance in deep learning context [4, 5]. We believe that the formal foundation of our model is suitable to start a new theory of deep-learning engineering, and that novel research lines will stem from the synergy of machine learning and topology. This synergy is object of study by more and more researchers, focusing both on the treatment of data via TDA before applying classical machine learning [6, 7], and the analysis of the topology of convolutional neural networks [8]. However, our approach differs from the previous ones in that it focuses on a new theoretical setting, based on the introduction of new topologies and metrics.
The paper is structured as follows. In Section 2 the epistemological foundations of our model are discussed. The mathematical background in topological persistence is provided in Section 3. Section 4 details the mathematical model for data, transformations, and GENEOs. Section 5 proves the compactness and convexity of the space of GENEOs, under suitable hypotheses. New results in persistent homology to define computable metrics in the space of GENEOs and in the space of data are presented in Section 6, along with the extension of the theory from group to set equivariance. Finally, in Section 7, we describe two algorithms to select and sample operators in the discrete case, and show examples on the MNIST and fashion-MNIST datasets. A Python package allowing to reproduce the computational experiments described in Section 7 is available in gitlab.com/mattia.bergomi/geneos.
2 Epistemological setting
Our mathematical model is justified by an epistemological background which revolves around the following assumptions:
Data are represented as functions defined on topological spaces, since only data that are stable with respect to a certain criterion (e.g., with respect to some kind of measurement) can be considered for applications, and stability requires a topological structure. 2. 2.
Data cannot be studied in a direct and absolute way. They are only knowable through acts of transformation made by an agent. From the point of view of data analysis, only the pair (data, agent) matters. In general terms, agents are not endowed with purposes or goals: they are just ways and methods to transform data. Acts of measurement are a particular class of acts of transformation. 3. 3.
Agents are described by the way they transform data while respecting some kind of invariance. In other words, any agent can be seen as a group equivariant operator acting on a function space. 4. 4.
Data similarity depends on the output of the considered agent.
In other words, in our framework we assume that the analysis of data is replaced by the analysis of the pair (data, agent). Since an agent can be seen as a group equivariant operator, from the mathematical viewpoint our purpose consists in presenting a good topological theory of suitable operators of this kind, representing agents. For more details, we refer the interested reader to [9].
3 Mathematical background
Our mathematical model builds on functional analysis and Topological Data Analysis (TDA). TDA is an emerging field of research which studies topological approaches to explore and make sense of complex, high-dimensional data, such as artificial and biological networks [10, 11]. The basic idea is that topology can help to recognize patterns within data, and therefore to turn data into useful knowledge. One of the main concepts in TDA is Persistent Homology (PH), a mathematical tool that captures topological information at multiple scales. Our mathematical model proposes an integration between the theory of group actions and persistent homology.
In summary, persistent homology allows to represent the topological and geometrical features of a topological space (e.g. an image, a -dimensional mesh) as it is seen by a continuous, real-valued function defined on the space. The homology functor (see for instance [12]) is used to encode the information of the pair in the form of persistence diagrams. In other words, we can associate each continuous function with a persistence diagram , that is represented by a discrete collection of points in the real plane. Beyond the technicalities that are needed to define the concept of persistence diagram, two important points are to be stressed. First, persistence diagrams can be quickly computed. Second, an easily computable distance between persistence diagrams is available and gives a lower bound for the max-norm distance between functions: . It follows that the bottleneck distance between persistence diagrams can be used as an efficient proxy for the max-norm distance between real-valued functions. Since our approach is deeply rooted in the comparison of real-valued functions, persistence diagrams are a key tool in our model. The definition of persistence diagram and of the bottleneck distance are intuitively depicted in fig. 1 and rapidly formalised in what follows. We refer the reader to [13, 14, 15] for further details.
3.1 Persistent Homology
In PH, data are modelled as objects in a metric space. The first step is to filter the data so to obtain a family of nested topological spaces that captures the topological information at multiple scales. A common way to obtain a filtration is by sublevel sets of a continuous function, hence the name sublevelset persistence. Let be a real-valued continuous function on a topological space . Persistent homology represents the changes of the homology groups of the sub-level set varying in . We can see the parameter as an increasing time, whose changes produce the birth and the death of -dimensional holes in the sub-level set . We observe that the number of independent 0-dimensional holes of equals the number of connected components of minus one, 1-dimensional holes refer to tunnels and 2-dimensional holes to voids.
Definition 3.1**.**
If and , we can consider the inclusion of into . If denotes the Čech homology functor, such an inclusion induces a homomorphism between the homology groups of and in degree . The group is called the th persistent homology group with respect to the function , computed at the point . The rank of is said the th persistent Betti numbers function (PBN) with respect to the function , computed at the point .
Persistent Betti numbers functions can be completely described by multisets called persistence diagrams. The th persistence diagram is the multiset of all the pairs , where and are the times of birth and death of the th -dimensional hole, respectively. When a hole never dies, we set its time to death equal to . The multiplicity says how many holes share both the time of birth and the time of death . For technical reasons, the points on the diagonal are added to each persistence diagram, each one with infinite multiplicity.
Each persistence diagram can contain an infinite number of points. For every , the equality means that does not belong to the persistence diagram . We define on a pseudo-metric as follows
[TABLE]
by agreeing that for .
The pseudo-metric between two points and takes the smaller value between the cost of moving to and the cost of moving and onto . Obviously, for every . If and , then equals the distance, induced by the max-norm, between and . Points at infinity have a finite distance only to the other points at infinity, and their distance equals the Euclidean distance between abscissas.
We can compare persistence diagrams by means of the bottleneck distance (also called matching distance) .
Definition 3.2**.**
Let be two persistence diagrams. We define the bottleneck distance between and by setting
[TABLE]
where varies in the set of all bijections from the multiset to the multiset .
For further informations about persistence diagrams and the bottleneck distance, we refer the reader to [15, 16]. Each persistent Betti numbers function is associated with exactly one persistence diagram, and (if we use Čech homology) every persistence diagram is associated with exactly one persistent Betti numbers function. Then the metric induces a pseudo-metric on the sets of the persistent Betti numbers functions [17].
4 Mathematical model
In our mathematical model, data are represented as function spaces, that is, as sets of real-valued functions on some topological space (Subsection 4.1). Function spaces come with invariance groups representing the transformations on data which are admissible for some agent (Subsection 4.2). The groups of transformations are specific to different agents, and can be either learned or part of prior knowledge. The operators on data are then defined as group equivariant non-expansive operators (GENEOs) (Subsection 4.3).
4.1 Data representation
Let us consider a set and a topological subspace of the set of all bounded functions from to , denoted by and endowed with the topology induced by the distance
[TABLE]
If is compact, then it is also bounded, i.e., there exists a non-negative real value , such that for every . We can think of as the space where one makes measurements, and of as the set of admissible measurements, called set of admissible functions. In other words, is the set of functions from to that can be produced by measuring instruments. For example, an image can be represented as a function from the real plane to the real numbers.
To quantify the distance between two points , we compare the values taken at and by the admissible functions in . Therefore, we endow with the extended pseudo-metric222We recall that a pseudo-metric is just a distance without the property that implies . An extended pseudo-metric is a pseudo-metric that may take the value . If is bounded, then is a pseudo-metric. defined by setting
[TABLE]
for every (see Appendix A).
The assumption behind the definition of is that two points can be distinguished only if they assume different values for some admissible function. As an example, if contains only constant functions, no discrimination can be made between points in and hence vanishes for every .
The pseudo-metric space can be considered as a topological space by choosing as a base the collection of all the sets
[TABLE]
where and (see [18]).
The reason to endow the measurement space with a topology, rather than considering just a set, follows from the need of formalizing the assumption that data are stable. To formalize stability we have to use a topology (or a pseudo-metric inducing a topology).
It is interesting to stress the link between the topology associated with and the initial topology333We recall that is the coarsest topology on such that each function is continuous. Explicitly, the open sets in are the sets that can be obtained as unions of finite intersections of sets , where and . In other words, a base of is given by the collection of all sets that can be represented as , where is a finite set of indexes and , for every [18]. on with respect to , when we take the Euclidean topology on .
Theorem 4.1**.**
The topology on induced by the pseudo-metric is finer than the initial topology on with respect to . If is totally bounded, then the topology coincides with .
(The proof is in Appendix C.)
Since is the coarsest topology on such that is continuous, Theorem 4.1 guarantees that the assumption that the functions are continuous is not restrictive in practice, for example while dealing with images, which often contain discontinuities. Indeed, our functions are not required to be continuous with respect to other topologies (e.g., the Euclidean topology on ).
In general is not compact with respect to the topology , even if is compact. For example, if is the open interval and contains only the identity from to , the topology induced by is simply the Euclidean topology and hence is not compact. However, the next result holds.
Theorem 4.2**.**
If is compact and is complete then is also compact.
(The proof is in Appendix C.)
4.1.1 A remark on the use of pseudo-metrics
The reader could think better to change the pseudo-metric into a metric by quotienting out by the equivalence relation and defining for any . The reason we do not do this is that several different sets of admissible measurements can be considered on the same set . For two different sets , of admissible functions, we obtain two different quotient spaces , . If we forget about the original space , we lose the possibility of linking the equivalence classes in with the ones in . On the contrary, we prefer to preserve the identity of points in , studying how they link to each other when we change the set . This observation leads us to work with pseudo-metrics instead of metrics.
Before proceeding, we observe that the map taking each point to the equivalence class is continuous with respect to and , and surjective. Moreover, takes each ball with respect to to a ball with respect to , while the inverse image under of each ball with respect to is a ball with respect to . It follows that if a subset is compact (sequentially compact) for then is compact (sequentially compact) for , and that if a subset is compact (sequentially compact) for then is compact (sequentially compact) for . Finally, given a sequence in , we observe that converges to in if and only if the sequence converges to in . These facts imply that the development of our theory in terms of pseudo-metrics is not far from the analysis in terms of metrics.
4.2 Transformations on data
In our model, we assume that data are transformed through maps from to which are -preserving homeomorphisms with respect to the pseudo-metric . Let denote the set of homeomorphisms from to with respect to , and denote the set of -preserving homeomorphisms, namely the homeomorphisms such that and for every .
The following Proposition 4.3 implies that is exactly the set of all bijections such that and for every .
Proposition 4.3**.**
If is a bijection from to such that and for every , then is an isometry444The definition of isometry between pseudo-metric spaces can be considered as a special case of isometry between metric spaces. Let and be two pseudo-metric spaces. It is easy to check that if is a function verifying the equality for every , then is continuous with respect to the topologies induced by and . If verifies the previous equality and is bijective, we say that it is an isometry between the considered pseudo-metric spaces. If is an isometry, we can trivially observe that is also an isometry, and that is a homeomorphism. (and hence a homeomorphism) with respect to .
(The proof is in Appendix C.)
Remark 4.4*.*
In general, . As an example, take and . In this case and , while is the set of all homeomorphisms from the interval to itself with respect to the Euclidean distance.
Remark 4.5*.*
For each , we consider the bijective map defined by setting for every . We claim that preserves the pseudo-distance defined by Equality (3). Indeed, if and then
[TABLE]
because is a bijection. Since is a bijection preserving , then is an isometry with respect to .
In the rest of this paper we will assume that is compact with respect to the topology induced by , and that is complete (and hence compact) with respect to the topology induced by .
Let us now consider a subgroup of the group . represents the set of transformations on data for which we require equivariance to be respected.
We can define the pseudo-distance on :
[TABLE]
from to (see Appendix A).
The rationale in the definition of is that in our model every comparison must be based on the max-norm distance between admissible acts of measurement. As a consequence, we define the distance between two homeomorphisms by the difference of their actions on the set of possible measurements.
Remark 4.6*.*
can be expressed as:
[TABLE]
We can now state the following theorems:
Theorem 4.7**.**
* is a topological group with respect to the pseudo-metric topology and the action of on through right composition is continuous.*
(The proof is in Appendix C.)
Theorem 4.8**.**
If is complete then it is also compact with respect to .
(The proof is in Appendix C.)
From now on we will suppose that is complete (and hence compact) with respect to the topology induced by .
4.2.1 The natural pseudo-distance
We define the natural pseudo-distance on the space [3]. The natural pseudo-distance represents the ground truth in our model. It is based on comparing functions, and vanishes for pairs of functions that are equivalent with respect to the action of our group of homeomorphisms , which expresses the equivalences between data.
Definition 4.9**.**
The pseudo-distance is defined by setting
[TABLE]
It is called the natural pseudo-distance associated with the group acting on .
If , then equals the sup-norm distance on . If and are subgroups of and , then the definition of implies that
[TABLE]
for every .
Though represents the ground truth for data similarity in our model, unfortunately it is difficult to compute. This is also a consequence of the fact that we can easily find subgroups of that cannot be approximated with arbitrary precision by smaller finite subgroups of (e.g., when is the group of rigid motions of ).
In the following sections, we show how can be approximated with arbitrary precision by means of a dual approach based on group equivariant non-expansive operators (GENEOs) and persistent homology.
4.2.2 A remark on the use of homeomorphisms
The reader could criticize the choice of grounding our approach on the concept of homeomorphism. After all, most of the objects that are considered for purposes of shape comparison “are not homeomorphic”. Therefore, the definition of natural pseudo-distance could seem not to be sufficiently flexible, since it does not allow to compare non-homeomorphic objects. Though, it is important to note that the space we use in our model does not represent the objects, but the space where one takes measurements about the objects. As such, is unique. For example, two images are considered as functions from the real plane to the real numbers, independently of the topological properties of the 3D objects represented in the images. If we make two CAT scans, the topological space is always given by an helix turning many times around a body, and no requirement is made about the topology of such a body. In other words, the topological space is determined only by the measuring instrument and not by the single object instances.
4.3 Group Equivariant Non-Expansive Operators
Under the assumptions made in the previous sections, the pair is called a perception pair.
Let us now assume that two perception pairs , are given together with a fixed homomorphism . Each function such that for every is said to be a perception map from to associated with the homomorphism . More briefly, we will also say that is a group equivariant operator. If is equal to the identity homomorphism , we can say that is a -map. We observe that the functions in and the functions in are defined on spaces that are generally different from each other.
Remark 4.10*.*
Each perception pair can be seen as a category, whose objects are the functions in and the morphisms between two functions are the elements such that . As usual, if and we wish to distinguish as a morphism between and from as a morphism between and , so we make different copies , of the homeomorphism by labelling it. As natural, . A precise formalization of this procedure can be done in terms of slice categories. For more details we refer the reader to Appendix B.
When two perception pairs , are considered as categories and a homomorphism is fixed, each perception map from to is naturally associated with a functor between the two categories, taking each function to and each morphism to the morphism .
Definition 4.11**.**
Assume that , are two perception pairs and that a homomorphism has been fixed. If is a perception map from to with respect to and is non-expansive (i.e., for every ), then is called a Group Equivariant Non-Expansive Operator (GENEO) associated with .
Example 4.12**.**
As a reference for the reader, we give the following basic example of GENEO. Let be the set containing all -Lipschitz functions from to , and be the group of all rotations of around the -axis. Let be the set containing all -Lipschitz functions from to , and be the group of all rotations of . We observe that and are two perception pairs. Now, let us consider the map taking each function to the function defined by setting (with polar coordinates), and the homomorphism taking the rotation of of radians around the -axis positively oriented to the counter-clock rotation of radians of . We can easily check that is a perception map and a GENEO from to , associated with the homomorphism . In this example and are surjective, but an example where and are not surjective can be easily found, e.g. by restricting to the singleton containing only the null function and to the trivial group containing only the identical homomorphism.
We can study how GENEOs act on the natural pseudo-distances:
Proposition 4.13**.**
If is a GENEO from to associated with , then it is a contraction with respect to the natural pseudo-distances , .
(The proof is in Appendix C.)
4.3.1 Pseudo-metrics on
Let us denote by the set of all GENEOs between two perception pairs , associated with . We can endow this set with the following pseudo-distances , .
Definition 4.14**.**
If , we set
[TABLE]
The next result can be easily proved by applying the inequality (see Theorem 6.1) and recalling that the supremum of a family of bounded pseudo-metrics is still a pseudo-metric.
Proposition 4.15**.**
* and are pseudo-metrics on . Moreover, .*
It would be easy to check that as a matter of fact is a metric.
This simple statement holds:
Proposition 4.16**.**
For every and every : , where 0 denotes the function taking the value 0 everywhere.
(The proof is in Appendix C.)
5 On the compactness and convexity of the space of GENEOs
In this section we show that, if the function spaces are compact and convex, then the space of GENEOs is compact and convex too. This property has important consequences from the computational point of view, since it guarantees that the space of GENEOs can be approximated by a finite set and that new GENEOs can be obtained by convex combination of preexisting GENEOs.
Several results in this section and in Section 6 mimic the corresponding results in [3], where the particular case , is considered. Note that considering different function spaces and different groups of equivariance is fundamental, as it allows one to compose operators hierarchically, in the same fashion as computational units are linked in an artificial neural network.
For the sake of conciseness, in the following we will set . We recall that we are assuming and compact with respect to and , respectively.
5.1 The space of GENEOs is compact with respect to
Theorem 5.1**.**
* is compact with respect to .*
(The proof is in Appendix C.)
5.2 The set of GENEOs is convex
Let be GENEOs from to associated with the homomorphism . Let with . Consider the function
[TABLE]
from to the set of the continuous functions from to , where is the domain of the functions in .
Proposition 5.2**.**
If , then is a GENEO from to with respect to .
(The proof is in Appendix C.)
Theorem 5.3**.**
If is convex, then the set of GENEOs from to with respect to is convex.
(The proof is in Appendix C.)
5.3 GENEOs as agents in our model
In our model the agents are represented by GENEOs. Indeed, each agent can be seen as a black box that receives and transforms data. If a nonempty subset of is fixed, a simple pseudo-distance to compare two admissible functions can be defined by setting . This definition expresses our assumption that the comparison of data strongly depends on the choice of the agents. However, we note that the computation of for every pair of admissible functions is computationally expensive. In the next section, we will see how persistent homology allows us to replace with a pseudo-metric that is quicker to compute, while still being stable and strongly invariant.
6 A strongly group-invariant pseudo-metric induced by Persistent Homology
In this section, we show how Persistent Homology supports the definition of a strongly group invariant pseudo-metric on , for which we prove some theoretical results.
We begin by recalling the stability of the classical pseudo-distance between persistent Betti numbers functions (BPNs) (cf. Definition 3.2) with respect to the pseudo-metrics and . We assume the finiteness of PBNs 555Though in our setting, the space is assumed to be compact, PBNs are not necessarily finite. For example, let us consider the set and . Even if is compact, every sublevel set with has infinite connected components, and hence the [math]th persistent Betti numbers function takes infinite value at every point with . We add the assumption on the finiteness of PBNs (i.e., the assumption that the persistent Betti numbers function of every takes a finite value at each point ) to get stability and discard pathological cases (for example the case that the set of admissible functions is the set of all maps from to ). Since the PBNs of the pseudo-metric space coincide with the persistent Betti numbers functions of its Kolmogorov quotient , the finiteness of the persistent Betti numbers functions can be obtained when is finitely triangulable (cf. [17]).. Then, the stability of with respect to easily follows from the stability theorem of the interleaving distance and the isometry theorem (cf. [19]).
Theorem 6.1**.**
If k is a natural number, and , then
[TABLE]
The proof of the first inequality in Theorem 6.1 is based on the stability of with respect to and can be found in [17]. The other inequalities follow from the definition of the natural pseudo-distance.
6.1 Strongly group invariant comparison of filtering functions via persistent homology
Let us consider a subset of . For every fixed , we can consider the following pseudo-metric on :
[TABLE]
for every , where denotes the th persistent Betti numbers function with respect to the function .
In this work, we will say that a pseudo-metric on is strongly G-invariant if it is invariant under the action of with respect to each variable, that is, if for every and every .
Remark 6.2*.*
It is easily seen that the natural pseudo-distance is strongly -invariant.
Proposition 6.3**.**
* is a strongly -invariant pseudo-metric on .*
(The proof is in Appendix C.)
6.2 Some theoretical results on the pseudo-metric
At first we want to show that the pseudo-metric is stable with respect to both the natural pseudo-distance associated with the group and the distance .
Remark 6.4*.*
Let and be two homeomorphic spaces and let be a homeomorphism. Then the persistent homology group with respect to the function and the persistent homology group with respect to the function are isomorphic at each point in the domain. Therefore we can say that the persistent homology groups and the persistent Betti numbers functions are invariant under the action of .
Theorem 6.5**.**
If is a non-empty subset of , then
[TABLE]
(The proof is in Appendix C.)
The definitions of the natural pseudo-distance and the pseudo-distance come from different theoretical concepts. The former is based on a variation approach involving the set of all homeomorphisms in , while the latter refers only to a comparison of persistent homologies depending on a family of group equivariant non-expansive operators. Given those comments, the next result may appear unexpected.
Theorem 6.6**.**
Let us assume that , every function in is non-negative, the -th Betti number of does not vanish, and contains each constant function for which a function exists such that . Then .
(The proof is in Appendix C.)
We observe that if is bounded, the assumption that every function in is non-negative is not quite restrictive. Indeed, we can obtain it by adding a suitable constant value to every admissible function.
6.3 Pseudo-metrics induced by persistent homology
Persistent homology can be seen as a topological method to build new and easily computable pseudo-metrics for the sets , and . These new pseudo-metrics , , can be used as proxies for (and hence ), , , respectively:
If , we can set . The stability theorem for persistence diagrams (Theorem 6.1) can be reformulated as the inequalities .
- 2.
If , we can set . From Theorem 6.1 the inequality follows.
- 3.
If , we can set . From Theorem 6.1 the inequalities follow.
In particular, and a discretized version of the pseudo-metric will be used in the experiments described in Section 7. We underline that the use of persistent homology is a key tool in our approach: it allows for a fast comparison between functions and between GENEOs. Without persistent homology, this comparison would be much more computationally expensive.
6.4 Approximating
The next result will be of use for the approximation of .
Proposition 6.7**.**
Let . If the Hausdorff distance
[TABLE]
is not larger than , then
[TABLE]
for every .
(The proof is in Appendix C.)
Since the compactness of the space guarantees we can cover by a finite set of balls in of radius , centered at points of a finite set , the following proposition states that the approximation of can be reduced to the computation of , i.e. the maximum of a finite set of bottleneck distances between persistence diagrams, which are well-known to be computable by means of efficient algorithms.
Proposition 6.8**.**
Let be a non-empty subset of . For every , a finite subset of exists, such that
[TABLE]
for every .
(The proof is in Appendix C.)
Remark 6.9*.*
Theorem 5.1 and the inequalities stated in Subsection 6.3 immediately imply that is compact also with respect to the topologies induced by and .
6.5 Beyond group equivariance
We observe that while the definition of the natural pseudo-distance requires that has the structure of a group, the definition of does not need this assumption. In other words, our approach based on GENEOs can be used also when we wish to have equivariance with respect to a set instead of a group of homeomorphisms. This property is promising for extending the application of our theory to the cases in which the agent is equivariant with respect to each element of a finite set of homeomorphisms that is not closed with respect to composition and computation of the inverse.
7 Validation on discrete function spaces
In summary, we introduced above a theoretical framework allowing to describe an agent acting on data as a collection of suitable operators. We do that by representing data as points of a space of continuous functions with compact support. The density of such space makes the quest for suitable operators for the approximation of a given agent computationally complicated. For this reason, we chose to consider GENEOs: enforcing equivariance with respect to the action of a group causes the dimensionality of the search-space to collapse. Furthermore, in Section 4, we showed how GENEO spaces can be equipped with suitable metrics and respect properties that are essential in a machine learning context. The results concerning compactness and convexity make it possible to safely explore the space of GENEOs when operating on a labelled dataset. One of the main issue to be addressed when working in the proposed setting is the computability of metrics between operators. In Section 6 we show how metrics between GENEOs can be lower approximated via persistent homology. These results should be enough to guarantee approximability, efficacy and computability of GENEOs, when utilised to solve supervised tasks.
Our mathematical model and theorems are based on the assumption that data can be treated as points in a space of continuous functions. In this section, we test the validity of such results on classification of real-world datasets proceeding as follows. First we describe an algorithm allowing to select and sample GENEOs in order to learn the metric induced on a dataset by a labelling function. After that, we define the class of GENEOs we will use to study the MNIST, fashion-MNIST and CIFAR10 datasets. Selection and sampling are then used to approximate an agent able to express the underlying metric of these datasets by observing only or examples per class. Thereafter, we show how the metric learned through selection and sampling is still expressive when used to represent distances among validation samples transformed according to the equivariances of the GENEOs of choice. Finally, we use selected and sampled GENEOs to inject knowledge in an artificial neural network.
7.1 Operators selection and sampling on labelled datasets
We start from the assumption that data labelled with the same symbol share common features with respect to the agent we want to approximate. Thus, we suggest an algorithm for metric learning based on the metrics introduced on the space of GENEOs in Section 6. Briefly, we start by selecting randomly a certain number of GENEOs. Afterwards, we compare them by taking advantage of the fact that their representation as persistence diagrams is invariant with respect to the action of . These selected operators see those features that are common among the samples associated to the same label. Finally, always profiting from the property of the matching distance to be lower bound of the metric defined on the space of operators, we sample the operators in order to obtain a minimal set of non-redundant operators.
In symbols, let be a dataset equipped with a labelling function . We assume that the dataset can be written as the disjoint union where contains samples labelled by . Let be the space of operators that will act on the samples. We begin by randomly sampling candidate operators in , let us denote them as the set . We then select those operators that consider as similar the objects belonging to the same class. Let us consider the samples in , for each of the candidate operators , we define the label-dependent value
[TABLE]
A candidate operator is selected if is smaller than a fixed threshold for every . Let us denote by the set of selected operators. In practice, we will show how few examples per class are enough to select operators able to grasp salient topological-geometrical features from the example samples, and can be consequently used to compute reasonable distances between new validation samples.
The selection criterion does not guarantee that the operators are maximally diverse, when evaluated within and in-between classes. The important advantage of working on metric spaces is that we can now sample the elements of to avoid storing operators that would focus on the same or similar characteristic. To this end, given a class , we define the distance between two operators and (cf. Subsection 6.3)
[TABLE]
For every label , we sort the pairs in ascending order of , and assign to each pair of operators its index in the sorted list of distances. We then define the interclass contrastive score of the pair as the sum of its indices over all classes. Finally, we remove from redundant operators, i.e. we select only one operators for pairs whose score is below a fixed threshold .
Finally, two objects and can be compared by computing the strongly -invariant pseudo-metric , defined in Section 6.
7.2 Isometry equivariant non-expansive operators
One of the main strength of convolutional neural networks is the natural equivariance of the convolution operator with respect to the group of planar translations. However, oftentimes when working with images or volumes, invariance with respect to other transformations such as rotations or reflexions can be important. In what follows we define a parametric family of non-expansive operators which are equivariant with respect to Euclidean plane isometries.
Given and , we consider the -dimensional Gaussian function with width and centre
[TABLE]
where . For a positive integer , we take the set of the -tuples for which . is a submanifold of .
For each , we then consider the function defined as
[TABLE]
If we denote by the convolutional operator mapping each continuous function with compact support to the continuous and with compactly supported function defined as
[TABLE]
Then, the operator is a group equivariant non-expansive operator with respect to the group of Euclidean plane isometries. We call a IENEO (Isometry Equivariant Non-Expansive Operator).
The IENEO is parametric with respect to the -tuple . Therefore, we define a parametric family of IENEOs .
7.3 Applications
We are now ready to utilise the selection and sampling strategy to find operators able to recognise samples belonging to the same class in a discrete dataset. We propose three different applications of our model. First we select and sample operators on two-classes subsets of the MNIST, fashion-MNIST and CIFAR10 datasets, we evaluate the validity of the learned metric by computing pairwise distances of validation samples according to the selected and sampled operators. Let us denote these operators by . Second, we evaluate on the MNIST dataset the capacity of the operators in to discriminate validation examples that have been transformed with random planar isometries. Finally, we use to initialise the filters of a convolutional layer and a dense architecture to classify the samples belonging to the classes the IENEOs where selected and sampled on.
7.3.1 Image preprocessing
Images are preprocessed according to the pipeline described in the first column of Figure 2. Every image is first reshaped to size , then blurred with a Gaussian kernel and finally standardised as . The same preprocessing is applied in all experiments and to all datasets.
7.3.2 Metric learning through selection and sampling
Metric learning is a natural application in the framework we describe. Indeed, operators that have been selected on labelled examples should be able to grasp geometrical and topological features that are shared among the examples belonging to the same class. Afterwards, selected and sampled operators can be used to measure distances between pairs of validation samples as
[TABLE]
This choice implies that two samples and will have distance [math], and hence are considered the same by the collection of selected operators (agent), only if every operator in sees them as identical. Note also that is invariant with respect to the action of the group of planar isometries. This invariance is naturally inherited by the usage of . After computing the pairwise distance between validation examples, we use hierarchical clustering [22] to visualise how samples have been organised by the metric as a dendrogram.
For every dataset , we select a subset of samples belonging to two classes. We start by randomly initialising a parametrised family of IENEOs (of cardinality or in the experiments that follow). Afterwards, a small number–typically or –of samples per class are randomly chosen. These samples are then used to select common within-class geometrical and topological features by selection and sampling. The threshold for the selection algorithm is set to and the threshold for sampling is defined as the percentile of all contrastive scores. These parameters are fixed and used in all the following experiments.
We first studied the efficacy of selection and sampling on a binary classification task on the MNIST dataset. After selecting samples belonging to two randomly selected classes of MNIST, we chose random samples per class to be used as examples in the selection and sampling algorithm. Sampled and selected IENEOs are then used to compute the pairwise distances of validation samples per class and generate the dendrogram in panel B of Figure 4. We reproduced three times the same experiment by varying the size and the number of -dimensional Gaussians used to initialise the IENEOs. In particular, we considered sizes . The number of Gaussians was chosen according to the size as and rounded to the nearest integer. The dendrograms resulting from this manipulation are depicted in panels B, C, D in Figure 4.
Successively, we applied the same strategy and parameters to the fashion-MNIST and CIFAR10 datasets, obtaining the results in Figure 5.
7.3.3 Validation on augmented samples
This application aims at testing the aforementioned equivariance of the distance defined in Equation 18. To do this, we consider a set of operators selected and sampled on non-transformed samples, while we transform the set of validation samples by applying a random transformation among translations, rotations and reflections parametrised as follows:
rotations are selected randomly to be between and degrees; 2. 2.
translations can be in both the and -axis directions in a range between and pixels; 3. 3.
reflections are computed randomly with respect to one of the two axes.
The transformed samples along with the dendrograms obtained by considering the metric induced by the selected and sampled operators are shown in Figure 6.
7.3.4 Knowledge injection
As a final application, we discuss the possibility of using selected and sampled operators as fixed feature extractor for a simple artificial neural network model. We do that by using the elements of to initialise non-trainable filters of a convolutional layer. On top of this layer, we use two fully-connected layers, the first with ReLu [23] and the latter softmax activations, two classify samples from pairs of classes of MNIST, fashion-MNIST and CIFAR10 datasets. Then we compare the performance of the classifier operating with the IENEO-initialised filters, with an identical architecture whose filters were initialised randomly with Glorot initilisation [24]. The architecture of the model and the performance are shown in Figure 7.
8 Discussion and conclusions
The first contribution of this paper consists in giving a novel, formal and sound mathematical framework for machine learning, based on the study of metric and topological properties of operator spaces acting on function spaces. This approach is dual to the classical one: instead of focusing on data, our approach focuses on suitable operators defined on the functions that represent the data. Of all possible type of operators, we study the space of non-expansive, group equivariant operators (GENEOs). When building a machine learning system, choosing to work on a space of operators equivariant with respect to specific transformations allows us to inject in the system pre-existing knowledge. Indeed, the operators will be blind to the action of the group on the data, hence reducing the dimensionality of the space to be explored during optimisation. The choice of working with non-expansive operators is justified both by the possibility of proving the compactness of the spaces of GENEOs (under the assumption of compactness of the spaces of measurements), and by the fact that in practical applications we are usually interested in operators that compress the information we have as an input. The rationale of our approach is based on the assumption that the main interest in machine learning does not consist in the analysis and the approximation of data, but in the analysis and the approximation of the observers looking at the data. A simple example can make this idea clearer: if we consider images representing skin lesions, we are not mainly interested in the images per se but rather in approximating the judgement given by the physicians about such images.
Presenting our mathematical model, we first show how the space of GENEOs is suitable for machine learning. By using pseudo-metrics, we define a topology on the space of GENEOs which is induced by the one we define on the function space of data. We build the necessary machinery to define maps between GENEOs whose groups of equivariance are different. This definition is fundamental, because it allows one to compose operators hierarchically, in the same fashion as computational units are linked in an artificial neural network. Thereafter, by taking advantage of known and novel results in persistent homology, we prove compactness and convexity of the space of GENEOs under suitable hypotheses. Moreover and importantly, we show how the suggested framework can be used to study operators that are equivariant with respect to set of transformations, rather than groups. In particular, we observe that the pseudo-metric defined in Subsection 6.1 can be used also in the case that the operators in are equivariant with respect to a set instead of a group of homeomorphisms. This possibility appears to be promising for future research. It is important to stress the use of persistent homology in our model: the metric comparison of GENEOs is a key point in our approach and persistent homology allows for a fast comparison of functions, so allowing for a fast comparison of GENEOs.
We give two algorithms that allow to select and sample from a space of operators given a dataset labelled for a classification task. These procedures allow to first select a subset of operators belonging to a certain GENEOs space, that give meaningful representation of the data with respect to their labelling, always invariant under the transformations induced by the action of . Thenceforth, the sampling algorithm allows to eliminate redundant operators. These two strategies are used to perform metric learning and kernel on MNIST and fashion-MNIST. In addition, we show how convolutional filters initialised by selecting and sampling on few samples effectively grasp useful knowledge, that can be utilised to classify the remainder of the samples, for instance by a dense classifier.
Our forward-looking goal is the one of defining a novel artificial neural network model based on functional modules. Modules would be more complex computational units than the standard artificial neuron. The core of each module would be a GENEO, thus each module would be defined a priori to be equivariant with respect to a set of transformations. On one hand, this choice would allow us to dramatically reduce the dimensionality of the manifold to be studied during optimisation. On the other hand, choosing the transformation equivariances to be respected at each layer would allow us to inject knowledge in the networks before training, and would assure that information is not acquired by relying on unwanted noisy regularities in the training data. Module networks would learn optimal transformations of the data to achieve a task, rather than operating on data themselves.
Module networks could be built by composing modules hierarchically and knowledge could be injected in the model by engineering the proper set of equivariances. These transformations would be easily interpretable and could offer a rigorous way to compare learning dynamics of different architectures during optimisation. In particular, we are investigating the possibility to generalize capsule networks [25, 26] and modify the dynamic routing algorithm, by using the metrics on the space of GENEOs to update the connectivity strength between modules.
We conclude by observing that several interesting problems and new lines of research naturally arise in our mathematical model. First of all some sets of GENEOs appear to have a structure of a Lie group and a Riemannian manifold: these structures seem worth study and analysis. Secondly, new methods for building GENEOs should be developed, in order to get good approximations of the spaces of GENEOs for given equivariance groups and function spaces. We plan to devote further research to these issues.
Appendix A Additional propositions
Proposition A.1**.**
The function is an extended pseudo-metric on .
Remark A.2*.*
We recall that a pseudo-metric is just a distance without the property: if , then .
Proof.
is obviously symmetrical.
- 2.
The definition of immediately implies that for any .
- 3.
The triangle inequality holds, since
[TABLE]
for any .
∎
Proposition A.3**.**
If is totally bounded, then for any there exists a finite subset of such that
[TABLE]
for every .
Proof.
Let us fix Since is totally bounded, we can find a finite subset such that for each there exists , for which . It follows that for any . Because of the definition of supremum of a subset of the set of all positive real numbers, for any we can choose a such that
[TABLE]
Now, if we take an index , for which , we have that:
[TABLE]
Hence,
[TABLE]
Finally, as goes to zero, we have that
[TABLE]
On the other hand, since :
[TABLE]
Therefore we proved the statement.
∎
Proposition A.4**.**
The function is a pseudo-metric on .
Proof.
The value is finite for every , because is compact and hence bounded. Indeed, a finite constant exists such that for every . Hence, for any and any , since . This implies that for every .
- 2.
is obviously symmetrical.
- 3.
The definition of immediately implies that for any .
- 4.
The triangle inequality holds, since
[TABLE]
for any .
∎
Appendix B Our approach in terms of slice categories
In this section, we will apply the concept of slice category to our framework in order to formalize the concept of perception pairs, which are considered as subcategories of a larger category denoted by , as we explain further. Moreover we explore the link between GENEOs and functors between categories of this kind.
Let PMet be the category whose objects are pseudo-metric spaces and morphisms are the continuous functions between them. Let us fix the space , that is the real line equipped with the usual Euclidean metric, and consider the slice category over .
Now we recall the definition of slice category:
Definition B.1**.**
The slice category of a category over an object has
objects that are all arrows such that ,
- 2.
morphisms that are all triples where and are two objects of , is a morphism of such that ; .
The slice category is a special case of a comma category.
Remark B.2*.*
There is a forgetful functor which maps each object to its domain and each morphism between and to the morphism .
We are going to associate a perception pair with a subcategory of defined as follows:
the objects of are the elements of ;
- 2.
the arrows of are the triples , where and .
We observe that the action of on ensures us that the arrow is well-defined for any and any .
Now we can define a “functorial” version of the concept of GENEO.
Definition B.3**.**
Let us consider two categories and . A functor from to is a -GENEO if:
for any ;
- 2.
for any pair of morphisms such that we have that
GENEOs and -GENEOs share the non-expansivity condition. The proposition below shows that the second conditions respectively required in the definitions of GENEO and -GENEO correspond to each other in a suitable sense. We omit its trivial proof.
Proposition B.4**.**
Let be a functor from to . The following conditions are equivalent:
there exists a group homomorphism such that for any and any ;
- 2.
for any pair of morphisms such that we have that .
Appendix C Proofs
Theorem** (4.1).**
The topology on induced by the pseudo-metric is finer than the initial topology on with respect to . If is totally bounded, then the topology coincides with .
Proof.
We know that the set is a base for the topology and the set is a base for the topology .
First of all we have to show that the topology is finer than the initial topology . Let us take a set in the base of , i.e. a set , where is a finite set of indexes and for every index . It will be sufficient to show that for every a ball exists, such that . Since , we have that for every . Therefore, for each we can find an open interval such that . Let us set , and observe that . If , then for every , and in particular for every . Hence the definition of immediately implies that for every , so that . It follows that . Therefore, , and our first statement is proved.
If is totally bounded, Proposition A.3 in Appendix A guarantees that for every a finite subset of exists such that
[TABLE]
for every . Let us now set B_{\delta}(x,r):=\left\{x^{\prime}\in X\Big{|}\max_{\varphi_{i}\in\varPhi_{\delta}}|\varphi_{i}(x)-\varphi_{i}(x^{\prime})|<r\right\} for every and every . We have to prove that the initial topology is finer than the topology . In order to do this, it will be sufficient to show that for every a set exists, such that .
Let us choose a positive such that . Inequality (20) implies that . We now set for . Obviously, . If , then for every . Hence, . It follows that . Therefore, because of the inclusion . This means that is finer than . Since we already know that is finer than , it follows that coincides with . ∎
Remark*.*
The second statement of Theorem 4.1 becomes false if is not totally bounded. For example, assume equal to the set of all functions from to that are continuous with respect to the Euclidean topologies on and . Indeed, it is easy to check that in this case is the discrete topology, while the initial topology is the Euclidean topology on .
Remark*.*
The pseudo-metric space may not be a -space. For example, this happens if is a space containing at least two points and is the set of all the constant functions from to .
Theorem** (4.2).**
If is compact and is complete then is also compact.
Proof.
First of all we want to prove that every sequence in admits a Cauchy subsequence in . After that, the statement follows immediately because every Cauchy sequence in a complete space is convergent, so that is sequentially compact, and hence compact, since is a pseudo-metric space [18].
Let us consider an arbitrary sequence in and an arbitrarily small . Since is compact, we can find a finite subset such that , where . In particular, we can say that for any there exists such that . Now, we consider the real sequence that is bounded because all the functions in are bounded. From Bolzano-Weierstrass Theorem it follows that we can extract a convergent subsequence . Then we consider the sequence . Since is bounded, we can extract a convergent subsequence . We can repeat the same argument for any . Thus, we obtain a subsequence of , such that is a real convergent sequence for any , and hence a Cauchy sequence in . Moreover, since is a finite set, there exists an index such that for any we have that
[TABLE]
We observe that does not depend on , but only on and .
In order to prove that is a Cauchy sequence in , we observe that for any and any we have:
[TABLE]
It follows that for every and every . Thus, . Hence, the sequence is a Cauchy sequence in . The completeness of implies that the statement of Theorem 4.2 is true. ∎
Example**.**
Let be the set containing all the -Lipschitz functions from to , and be the group of all rotations of radians with . The topological space is neither complete nor compact.
Proposition** (4.3).**
If is a bijection from to such that and for every , then is an isometry (and hence a homeomorphism) with respect to .
Proof.
Let us fix two arbitrary points in . Obviously, the map taking each function to is surjective, since . Hence . Therefore, preserves the pseudo-distance :
[TABLE]
Since is bijective, it follows that is an isometry with respect to . ∎
Theorem** (4.7).**
* is a topological group with respect to the pseudo-metric topology and the action of on through right composition is continuous.*
Proof.
It will suffice to prove that if and in with respect to the pseudo-metric , then and .
Because of the compactness of and Proposition A.3, for every we can take a finite subset of such that
[TABLE]
for every . We have that
[TABLE]
Since , . Because of Theorem 4.2, is compact and hence is a uniformly continuous function. Since , it follows that for every , and hence . Given that can be taken arbitrarily small, we get .
We also want to prove that . By contradiction, if we had not that , then there would exist a constant and a subsequence of such that for every index . However, we should still have because is a subsequence of . Since for every index , a should exist such that .
Because of the compactness of , it would not be restrictive to assume (possibly by considering subsequences) the existence of the following limits: and . We would have that
[TABLE]
so that .
On the other hand, we should have
[TABLE]
so that .
It follows that is not injective, against our assumptions.
This contradiction proves that .
Therefore, is a topological group.
Let now be a positive real number. If then
[TABLE]
This proves that the action of on through right composition is continuous. ∎
Theorem** (4.8).**
If is complete then it is also compact with respect to .
Proof.
We want to show that is sequentially compact, and hence compact. Let be a sequence in and take a real number . Given that is compact, we can find a finite subset such that for every there exists for which . For any fixed , let us consider the sequence in . Applying the same argument as in the proof of Theorem 4.2, we can extract a subsequence of such that converges in with respect to and hence it is a Cauchy sequence for any . For the finiteness of set , we can find an index such that
[TABLE]
In order to prove that is a Cauchy sequence, we observe that for any , any , and any we have
[TABLE]
We observe that does not depend on , but only on and . By choosing a such that , we get for every and every . Thus, . Hence, the sequence is a Cauchy sequence. Finally, given that is complete, is convergent. Therefore, is sequentially compact. ∎
Example**.**
Let be the set containing all the -Lipschitz functions from to , and be the group of all rotations of of radians with rational number. The space is neither complete nor compact.
Proposition** (4.13).**
If is a GENEO from to associated with , then it is a contraction with respect to the natural pseudo-distances , .
Proof.
Since is a GENEO, it follows that
[TABLE]
∎
Proposition** (4.16).**
For every and every : , where 0 denotes the function taking the value 0 everywhere.
Proof.
Since is non-expansive, we have that
[TABLE]
∎
Theorem** (5.1).**
* is compact with respect to .*
Proof.
We know that is a metric space. Therefore it will suffice to prove that is sequentially compact. In order to do this, let us assume that a sequence in is given. Given that is a compact (and hence separable) metric space, we can find a countable and dense subset of . By means of a diagonalization process, we can extract a subsequence from , such that for every fixed index the sequence converges to a function in with respect to . Now, let us consider the function defined by setting for each .
We extend to as follows. For every we choose a sequence in , converging to , and set . We claim that such a limit exists in and does not depend on the sequence that we have chosen, converging to . In order to prove that the previous limit exists, we observe that for every
[TABLE]
because each is non-expansive.
Since the sequence converges to , it follows that is a Cauchy sequence with respect to . The compactness of implies that converges in .
If another sequence in given in , converging to , then for every index
[TABLE]
Since both and converge to it follows that . Therefore the definition of does not depend on the sequence that we have chosen, converging to .
Now we have to prove that , i.e., that verifies the properties defining this set of operators. We have already seen that .
For every we can consider two sequences , in , converging to and , respectively. Due to the fact that the operators are non-expansive, we have that
[TABLE]
Therefore, is non-expansive. As a consequence, it is also continuous.
We can now prove that the sequence converges to with respect to .
Let us consider an arbitrarily small . Since is compact and is dense in , we can find a finite subset of such that for every , there exists an index , for which .
Since the sequence converges pointwise to on the set , an index exists, such that for any and any . Therefore, for every we can find an index such that and the following inequalities hold for every index , because of the non-expansivity of and :
[TABLE]
We observe that does not depend on , but only on and on the set . It follows that for every and every .
Hence, for every . Therefore, the sequence converges to with respect to .
The last thing that we have to show is that is group equivariant. Let us consider a , a sequence in converging to in and a . Obviously, and hence the sequence converges to in with respect to . We recall that the right action of on is continuous, is continuous and each is group equivariant. Hence, given that the sequence converges to with respect to , the following equalities hold:
[TABLE]
This proves that is group equivariant, and hence a perception map. In conclusion, is a GENEO. From the fact that the sequence converges to with respect to , it follows that is sequentially compact. ∎
Proposition** (5.2).**
If , then is a GENEO from to with respect to .
Proof.
First we prove that is a perception map with respect to . Since every is a perception map we have that:
[TABLE]
Since every is non-expansive, is non-expansive:
[TABLE]
Therefore is a GENEO. ∎
Theorem** (5.3).**
If is convex, then the set of GENEOs from to with respect to is convex.
Proof.
It is sufficient to apply Proposition 5.2 for , by setting , for , and observing that the convexity of implies . ∎
Proposition**.**
* is a strongly -invariant pseudo-metric on .*
Proof.
Theorem 6.1 and the non-expansivity of every imply that
[TABLE]
Therefore is a pseudo-metric, since it is the supremum of a family of pseudo-metrics that are bounded at each pair . Moreover, for every and every
[TABLE]
because of the equality for every and every and the invariance of persistent homology under the action of the homeomorphisms. Since the function is symmetric, this is sufficient to guarantee that is strongly -invariant. ∎
Theorem** (6.5).**
If is a non-empty subset of , then
[TABLE]
Proof.
For every , every and every , we have that
[TABLE]
The first equality follows from the invariance of persistent homology under action of (see Remark 6.4), and the second equality follows from the fact F is a group equivariant operator. The first inequality follows from the stability of persistent homology (Theorem 6.1), while the second inequality follows from the non-expansivity of . It follows that, if , then for every and every
[TABLE]
Hence, the inequality follows, while is stated in Theorem 6.1. ∎
Theorem** (6.6).**
Let us assume that , every function in is non-negative, the -th Betti number of does not vanish, and contains each constant function for which a function exists such that . Then .
Proof.
For every let us consider the operator defined by setting equal to the constant function taking everywhere the value for every (i.e., for any ). Our assumptions guarantee that such a constant function belongs to . We also set .
We observe that
is a group equivariant operator on , because the strong invariance of the natural pseudo-distance with respect to the group (Remark 6.2) implies that if and , then , for every . 2. 2.
is non-expansive on , because for every
[TABLE]
Therefore, is a GENEO.
For every we have that
[TABLE]
Indeed, apart from the trivial points on the line , the persistence diagram associated with contains only the point , while the persistence diagram associated with contains only the point . Both the points have the same multiplicity, which equals the (non-null) -th Betti number of .
Setting , we have that
[TABLE]
As a consequence, we have that
[TABLE]
By applying Theorem 6.5, we get
[TABLE]
for every . ∎
Proposition** (6.7).**
Let . If the Hausdorff distance
[TABLE]
is not larger than , then
[TABLE]
for every .
Proof.
Since , for every a and an exist such that . The definition of implies that for every . From Theorem 6.1 it follows that
[TABLE]
and
[TABLE]
for every .
Therefore,
[TABLE]
As a consequence, . We can show analogously that . Since can be chosen arbitrarily small, from the previous two inequalities the proof of our statement follows. ∎
Proposition** (6.8).**
Let be a non-empty subset of . For every , a finite subset of exists, such that
[TABLE]
for every .
Proof.
Let us consider the closure of in . Let us also consider the covering of obtained by taking all the open balls of radius centered at points of , with respect to . Theorem 5.1 guarantees that is compact, hence also is compact. Therefore we can extract a finite covering of from . We can set equal to the set of centers of the balls . The statement of our corollary immediately follows from Proposition 6.7, by recalling that and hence . ∎
References
- [1]
Y. LeCun, Y. Bengio, et al., Convolutional networks for images, speech, and time series, The handbook of brain theory and neural networks 3361 (10) (1995) 1995.
- [2]
F. Anselmi, L. Rosasco, T. Poggio, On invariance and selectivity in representation learning, Information and Inference: A Journal of the IMA 5 (2) (2016) 134–158.
arXiv:/oup/backfile/content_public/journal/imaiai/5/2/10.1093_imaiai_iaw009/2/iaw009.pdf, doi:10.1093/imaiai/iaw009.
URL http://dx.doi.org/10.1093/imaiai/iaw009
- [3]
P. Frosini, G. Jabłoński, Combining persistent homology and invariance groups for shape comparison, Discrete Comput. Geom. 55 (2) (2016) 373–409.
doi:10.1007/s00454-016-9761-y.
URL http://dx.doi.org/10.1007/s00454-016-9761-y
- [4]
T. Cohen, M. Welling, Group equivariant convolutional networks, in: International conference on machine learning, 2016, pp. 2990–2999.
- [5]
D. E. Worrall, S. J. Garbin, D. Turmukhambetov, G. J. Brostow, Harmonic networks: Deep translation and rotation equivariance, in: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Vol. 2, 2017.
- [6]
H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman, S. Chepushtanova, E. Hanson, F. Motta, L. Ziegelmeier, Persistence images: A stable vector representation of persistent homology, J. Mach. Learn. Res. 18 (1) (2017) 218–252.
URL http://dl.acm.org/citation.cfm?id=3122009.3122017
- [7]
C. S. Pun, K. Xia, S. Xian Lee, Persistent-homology-based machine learning and its applications – A survey, arXiv e-prints (2018) arXiv:1811.00252arXiv:1811.00252.
- [8]
R. B. Gabrielsson, G. Carlsson, Exposition and interpretation of the topology of neural networks, CoRR abs/1810.03234.
URL http://arxiv.org/abs/1810.03234
- [9]
P. Frosini, Towards an Observer-oriented Theory of Shape Comparison, in: A. Ferreira, A. Giachetti, D. Giorgi (Eds.), Eurographics Workshop on 3D Object Retrieval, The Eurographics Association, 2016.
- [10]
G. Carlsson, Topology and data, Bull. Amer. Math. Soc. (N.S.) 46 (2) (2009) 255–308.
doi:10.1090/S0273-0979-09-01249-X.
URL https://doi.org/10.1090/S0273-0979-09-01249-X
- [11]
P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, G. E. Carlsson, Extracting insights from the shape of complex data using topology, in: Scientific reports, Vol. 3, 2013.
- [12]
A. Hatcher, Algebraic topology, 清华大学出版社有限公司, 2005.
- [13]
S. Biasotti, L. De Floriani, B. Falcidieno, P. Frosini, D. Giorgi, C. Landi, L. Papaleo, M. Spagnuolo, Describing Shapes by Geometrical-topological Properties of Real Functions, ACM Comput. Surv. 40 (4) (2008) 12:1–12:87.
URL http://doi.acm.org/10.1145/1391729.1391731
- [14]
G. Carlsson, A. Zomorodian, The theory of multidimensional persistence, Discrete Comput. Geom. 42 (1) (2009) 71–93.
doi:10.1007/s00454-009-9176-0.
URL http://dx.doi.org/10.1007/s00454-009-9176-0
- [15]
H. Edelsbrunner, J. Harer, Persistent homology—a survey, in: Surveys on discrete and computational geometry, Vol. 453 of Contemp. Math., Amer. Math. Soc., Providence, RI, 2008, pp. 257–282.
URL http://dx.doi.org/10.1090/conm/453/08802
- [16]
D. Cohen-Steiner, H. Edelsbrunner, J. Harer, Stability of persistence diagrams, Discrete Comput. Geom. 37 (1) (2007) 103–120.
doi:10.1007/s00454-006-1276-5.
URL http://dx.doi.org/10.1007/s00454-006-1276-5
- [17]
A. Cerri, B. Di Fabio, M. Ferri, P. Frosini, C. Landi, Betti numbers in multidimensional persistent homology are stable functions, Math. Methods Appl. Sci. 36 (12) (2013) 1543–1557.
URL http://dx.doi.org/10.1002/mma.2704
- [18]
S. A. Gaal, Point set topology, Pure and Applied Mathematics, Vol. XVI, Academic Press, New York-London, 1964.
- [19]
S. Y. Oudot, Persistence theory: from quiver representations to data analysis, Vol. 209 of Mathematical Surveys and Monographs, American Mathematical Society, Providence, RI, 2015.
URL https://doi.org/10.1090/surv/209
- [20]
A. Zomorodian, Fast construction of the Vietoris-Rips complex, Computers & Graphics 34 (3) (2010) 263–271.
- [21]
R. Fabbri, L. D. F. Costa, J. C. Torelli, O. M. Bruno, 2D Euclidean distance transform algorithms: A comparative survey, ACM Computing Surveys (CSUR) 40 (1) (2008) 2.
- [22]
P. Langfelder, B. Zhang, S. Horvath, Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R, Bioinformatics 24 (5) (2007) 719–720.
- [23]
V. Nair, G. E. Hinton, Rectified linear units improve restricted boltzmann machines, in: Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807–814.
- [24]
X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in: Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp. 249–256.
- [25]
G. E. Hinton, A. Krizhevsky, S. D. Wang, Transforming auto-encoders, in: International Conference on Artificial Neural Networks, Springer, 2011, pp. 44–51.
- [26]
S. Sabour, N. Frosst, G. E. Hinton, Dynamic Routing Between Capsules, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems 30, Curran Associates, Inc., 2017, pp. 3856–3866.
URL http://papers.nips.cc/paper/6975-dynamic-routing-between-capsules.pdf
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Y. Le Cun, Y. Bengio, et al., Convolutional networks for images, speech, and time series, The handbook of brain theory and neural networks 3361 (10) (1995) 1995.
- 2[2] F. Anselmi, L. Rosasco, T. Poggio, On invariance and selectivity in representation learning , Information and Inference: A Journal of the IMA 5 (2) (2016) 134–158. ar Xiv:/oup/backfile/content_public/journal/imaiai/5/2/10.1093_imaiai_iaw 009/2/iaw 009.pdf , doi:10.1093/imaiai/iaw 009 . URL http://dx.doi.org/10.1093/imaiai/iaw 009 · doi ↗
- 3[3] P. Frosini, G. Jabłoński, Combining persistent homology and invariance groups for shape comparison , Discrete Comput. Geom. 55 (2) (2016) 373–409. doi:10.1007/s 00454-016-9761-y . URL http://dx.doi.org/10.1007/s 00454-016-9761-y · doi ↗
- 4[4] T. Cohen, M. Welling, Group equivariant convolutional networks, in: International conference on machine learning, 2016, pp. 2990–2999.
- 5[5] D. E. Worrall, S. J. Garbin, D. Turmukhambetov, G. J. Brostow, Harmonic networks: Deep translation and rotation equivariance, in: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Vol. 2, 2017.
- 6[6] H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman, S. Chepushtanova, E. Hanson, F. Motta, L. Ziegelmeier, Persistence images: A stable vector representation of persistent homology , J. Mach. Learn. Res. 18 (1) (2017) 218–252. URL http://dl.acm.org/citation.cfm?id=3122009.3122017
- 7[7] C. S. Pun, K. Xia, S. Xian Lee, Persistent-homology-based machine learning and its applications – A survey, ar Xiv e-prints (2018) ar Xiv:1811.00252 ar Xiv:1811.00252 .
- 8[8] R. B. Gabrielsson, G. Carlsson, Exposition and interpretation of the topology of neural networks , Co RR abs/1810.03234. ar Xiv:1810.03234 . URL http://arxiv.org/abs/1810.03234
