Towards a topological-geometrical theory of group equivariant   non-expansive operators for data analysis and machine learning

Mattia G. Bergomi; Patrizio Frosini; Daniela Giorgi; Nicola Quercioli

arXiv:1812.11832·cs.LG·March 5, 2019

Towards a topological-geometrical theory of group equivariant non-expansive operators for data analysis and machine learning

Mattia G. Bergomi, Patrizio Frosini, Daniela Giorgi, Nicola Quercioli

PDF

1 Repo

TL;DR

This paper develops a mathematical framework for group equivariance in machine learning using topological and geometric tools, introducing GENEOs to improve data analysis and neural network initialization.

Contribution

It introduces group-equivariant non-expansive operators (GENEOs), analyzing their properties and demonstrating their application in metric learning and CNN initialization.

Findings

01

GENEOs form a compact and convex space under certain conditions.

02

Sampled GENEOs can effectively initialize CNN kernels.

03

The framework applies to datasets like MNIST and Fashion-MNIST.

Abstract

The aim of this paper is to provide a general mathematical framework for group equivariance in the machine learning context. The framework builds on a synergy between persistent homology and the theory of group actions. We define group-equivariant non-expansive operators (GENEOs), which are maps between function spaces associated with groups of transformations. We study the topological and metric properties of the space of GENEOs to evaluate their approximating power and set the basis for general strategies to initialise and compose operators. We begin by defining suitable pseudo-metrics for the function spaces, the equivariance groups, and the set of non-expansive operators. Basing on these pseudo-metrics, we prove that the space of GENEOs is compact and convex, under the assumption that the function spaces are compact and convex. These results provide fundamental guarantees in a…

Equations195

d^{*} ((x, y), (x^{'}, y^{'})) := min {max {∣ x - x^{'} ∣, ∣ y - y^{'} ∣}, max {\frac{y - x}{2}, \frac{y ^{'} - x ^{'}}{2}}}

d^{*} ((x, y), (x^{'}, y^{'})) := min {max {∣ x - x^{'} ∣, ∣ y - y^{'} ∣}, max {\frac{y - x}{2}, \frac{y ^{'} - x ^{'}}{2}}}

δ_{match} (D, D^{'}) := σ in f p \in D sup d^{*} (p, σ (p)),

δ_{match} (D, D^{'}) := σ in f p \in D sup d^{*} (p, σ (p)),

D_{Φ} (φ_{1}, φ_{2}) := ∥ φ_{1} - φ_{2} ∥_{\infty} .

D_{Φ} (φ_{1}, φ_{2}) := ∥ φ_{1} - φ_{2} ∥_{\infty} .

D_{X} (x_{1}, x_{2}) = φ \in Φ sup ∣ φ (x_{1}) - φ (x_{2}) ∣

D_{X} (x_{1}, x_{2}) = φ \in Φ sup ∣ φ (x_{1}) - φ (x_{2}) ∣

B_{X} (x, ε) = {x^{'} \in X : D_{X} (x, x^{'}) < ε}

B_{X} (x, ε) = {x^{'} \in X : D_{X} (x, x^{'}) < ε}

D_{Φ} (φ \circ g, φ^{'} \circ g)

D_{Φ} (φ \circ g, φ^{'} \circ g)

= x \in X sup ∣ φ (g (x)) - φ^{'} (g (x)) ∣

= y \in X sup ∣ φ (y) - φ^{'} (y) ∣ = D_{Φ} (φ, φ^{'}),

D_{G} (g_{1}, g_{2}) := φ \in Φ sup D_{Φ} (φ \circ g_{1}, φ \circ g_{2})

D_{G} (g_{1}, g_{2}) := φ \in Φ sup D_{Φ} (φ \circ g_{1}, φ \circ g_{2})

D_{G} (g_{1}, g_{2}) = x \in X sup D_{X} (g_{1} (x), g_{2} (x)) = x \in X sup φ \in Φ sup ∣ φ (g_{1} (x)) - φ (g_{2} (x)) ∣.

D_{G} (g_{1}, g_{2}) = x \in X sup D_{X} (g_{1} (x), g_{2} (x)) = x \in X sup φ \in Φ sup ∣ φ (g_{1} (x)) - φ (g_{2} (x)) ∣.

d_{G} (φ_{1}, φ_{2}) = g \in G in f D_{Φ} (φ_{1}, φ_{2} \circ g) .

d_{G} (φ_{1}, φ_{2}) = g \in G in f D_{Φ} (φ_{1}, φ_{2} \circ g) .

d_{Homeo_{Φ} (X)} (φ_{1}, φ_{2}) \leq d_{G_{2}} (φ_{1}, φ_{2}) \leq d_{G_{1}} (φ_{1}, φ_{2}) \leq D_{Φ} (φ_{1}, φ_{2})

d_{Homeo_{Φ} (X)} (φ_{1}, φ_{2}) \leq d_{G_{2}} (φ_{1}, φ_{2}) \leq d_{G_{1}} (φ_{1}, φ_{2}) \leq D_{Φ} (φ_{1}, φ_{2})

D_{GENEO} (F_{1}, F_{2})

D_{GENEO} (F_{1}, F_{2})

D_{GENEO, H} (F_{1}, F_{2})

F_{Σ} (φ) := i = 1 \sum n a_{i} F_{i} (φ)

F_{Σ} (φ) := i = 1 \sum n a_{i} F_{i} (φ)

d_{match} (r_{k} (φ_{1}), r_{k} (φ_{2})) \leq d_{Homeo (X)} (φ_{1}, φ_{2}) \leq d_{G_{2}} (φ_{1}, φ_{2}) \leq d_{G_{1}} (φ_{1}, φ_{2}) \leq D_{Φ} (φ_{1}, φ_{2}) .

d_{match} (r_{k} (φ_{1}), r_{k} (φ_{2})) \leq d_{Homeo (X)} (φ_{1}, φ_{2}) \leq d_{G_{2}} (φ_{1}, φ_{2}) \leq d_{G_{1}} (φ_{1}, φ_{2}) \leq D_{Φ} (φ_{1}, φ_{2}) .

D_{match}^{F, k} (φ_{1}, φ_{2}) := F \in F sup d_{match} (r_{k} (F (φ_{1})), r_{k} (F (φ_{2})))

D_{match}^{F, k} (φ_{1}, φ_{2}) := F \in F sup d_{match} (r_{k} (F (φ_{1})), r_{k} (F (φ_{2})))

D_{match}^{F, k} \leq d_{G} \leq D_{Φ} .

D_{match}^{F, k} \leq d_{G} \leq D_{Φ} .

H D (F, F^{'}) := max {F \in F sup F^{'} \in F^{'} in f D_{GENEO, H} (F, F^{'}), F^{'} \in F^{'} sup F \in F in f D_{GENEO, H} (F, F^{'})}

H D (F, F^{'}) := max {F \in F sup F^{'} \in F^{'} in f D_{GENEO, H} (F, F^{'}), F^{'} \in F^{'} sup F \in F in f D_{GENEO, H} (F, F^{'})}

D_{match}^{F, k} (φ_{1}, φ_{2}) - D_{match}^{F^{'}, k} (φ_{1}, φ_{2}) \leq 2 ε

D_{match}^{F, k} (φ_{1}, φ_{2}) - D_{match}^{F^{'}, k} (φ_{1}, φ_{2}) \leq 2 ε

∣ D_{match}^{F^{*}, k} (φ_{1}, φ_{2}) - D_{match}^{F, k} (φ_{1}, φ_{2}) ∣ \leq ε

∣ D_{match}^{F^{*}, k} (φ_{1}, φ_{2}) - D_{match}^{F, k} (φ_{1}, φ_{2}) ∣ \leq ε

s_{l} (F) = φ_{i}^{l}, φ_{j}^{l} max d_{match} (r_{1} (F (φ_{i}^{l})), r_{1} (F (φ_{j}^{l}))) .

s_{l} (F) = φ_{i}^{l}, φ_{j}^{l} max d_{match} (r_{1} (F (φ_{i}^{l})), r_{1} (F (φ_{j}^{l}))) .

Δ_{GENEO}^{l} (F_{p}, F_{q}) := φ_{i}^{l} max d_{match} (r_{k} (F_{p} (φ_{i}^{l})), r_{k} (F_{q} (φ_{i}^{l}))) .

Δ_{GENEO}^{l} (F_{p}, F_{q}) := φ_{i}^{l} max d_{match} (r_{k} (F_{p} (φ_{i}^{l})), r_{k} (F_{q} (φ_{i}^{l}))) .

g_{τ} (t) := e^{- \frac{( t - τ ) ^{2}}{2 σ ^{2}}},

g_{τ} (t) := e^{- \frac{( t - τ ) ^{2}}{2 σ ^{2}}},

G_{p} (x, y) := i = 1 \sum k a_{i} g_{τ_{i}} (x^{2} + y^{2}) .

G_{p} (x, y) := i = 1 \sum k a_{i} g_{τ_{i}} (x^{2} + y^{2}) .

ψ (x, y) := \int_{R^{2}} φ (α, β) \cdot \frac{G _{p} ( x - α , y - β )}{∥ G _{p} ∥ _{L^{1}}} d α d β .

ψ (x, y) := \int_{R^{2}} φ (α, β) \cdot \frac{G _{p} ( x - α , y - β )}{∥ G _{p} ∥ _{L^{1}}} d α d β .

d_{S} (φ, φ^{'}) = F \in S max d_{match} (r_{1} (F (φ)), r_{1} (F (φ^{'}))) .

d_{S} (φ, φ^{'}) = F \in S max d_{match} (r_{1} (F (φ)), r_{1} (F (φ^{'}))) .

D_{X} (x_{1}, x_{2})

D_{X} (x_{1}, x_{2})

\leq φ \in Φ sup (∣ φ (x_{1}) - φ (x_{3}) ∣ + ∣ φ (x_{3}) - φ (x_{2}) ∣)

\leq φ \in Φ sup ∣ φ (x_{1}) - φ (x_{3}) ∣ + φ \in Φ sup ∣ φ (x_{3}) - φ (x_{2}) ∣

= D_{X} (x_{1}, x_{3}) + D_{X} (x_{3}, x_{2})

φ \in Φ sup ∣ φ (x_{1}) - φ (x_{2}) ∣ - φ \in Φ_{δ} max ∣ φ (x_{1}) - φ (x_{2}) ∣ < 2 δ

φ \in Φ sup ∣ φ (x_{1}) - φ (x_{2}) ∣ - φ \in Φ_{δ} max ∣ φ (x_{1}) - φ (x_{2}) ∣ < 2 δ

φ \in Φ sup ∣ φ (x_{1}) - φ (x_{2}) ∣ - ∣ \overset{φ}{ˉ} (x_{1}) - \overset{φ}{ˉ} (x_{2}) ∣ \leq ε .

φ \in Φ sup ∣ φ (x_{1}) - φ (x_{2}) ∣ - ∣ \overset{φ}{ˉ} (x_{1}) - \overset{φ}{ˉ} (x_{2}) ∣ \leq ε .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://gitlab.com/mattia.bergomi/geneos
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Towards a topological-geometrical theory of group equivariant non-expansive operators for data analysis and machine learning

Mattia G. Bergomi

Champalimaud Research, Champalimaud Center for the Unknown - Lisbon, Portugal

[email protected]

Patrizio Frosini

Department of Mathematics, University of Bologna

Advanced Research Center on Electronic System “Ercole De Castro”, University of Bologna11footnotemark: 1

[email protected]

Daniela Giorgi

Italian National Research Council, Institute of Information Science and Technologies “Alessandro Faedo”

[email protected]

Nicola Quercioli

[email protected]

Abstract

The aim of this paper is to provide a general mathematical framework for group equivariance in the machine learning context. The framework builds on a synergy between persistent homology and the theory of group actions. We define group equivariant non-expansive operators (GENEOs), which are maps between function spaces associated with groups of transformations. We study the topological and metric properties of the space of GENEOs to evaluate their approximating power and set the basis for general strategies to initialise and compose operators. We begin by defining suitable pseudo-metrics for the function spaces, the equivariance groups, and the set of non-expansive operators. Basing on these pseudo-metrics, we prove that the space of GENEOs is compact and convex, under the assumption that the function spaces are compact and convex. These results provide fundamental guarantees in a machine learning perspective. We show examples on the MNIST and fashion-MNIST datasets. By considering isometry-equivariant non-expansive operators, we describe a simple strategy to select and sample operators, and show how the selected and sampled operators can be used to perform both classical metric learning and an effective initialisation of the kernels of a convolutional neural network.

keywords:

Group equivariant non-expansive operator, invariance group , group action , initial topology , persistent homology, persistence diagram, bottleneck distance, natural pseudo-distance, agent, perception pair, slice category, topological data analysis

MSC:

Primary 55N35 Secondary 47H09, 54H15, 57S10, 68U05, 65D18

1 Introduction

Deep learning-based algorithms reached human or superhuman performance in many real-world tasks. Beyond the extreme effectiveness of deep learning, one of the main reasons for its success is that raw data are sufficient—if not even more suitable than hand-crafted features—for these algorithms to learn a specific task. However, only few attempts have been made to create formal theories allowing for the creation of a controllable and interpretable framework, in which deep neural networks can be formally defined and studied. Furthermore, if learning directly from raw data allows one to outclass human feature engineering, the architectures of deep networks are growing more and more complex, and often are as task-specific as hand-crafted features used to be.

We aim at providing a general mathematical framework, where any agent capable of acting on a certain dataset (e.g. deep neural networks) can be formally described as a collection of operators acting on the data. To motivate our model, we assume that data cannot be studied directly, but only through the action of agents that measure and transform them. Consequently, our model stems from a functional viewpoint. By interpreting data as points of a function space, it is possible to learn and optimise operators defined on the data. In other words, we are interested in the space of transformations of the data, rather than the data themselves.

Albeit unformalised, this idea is not new in deep learning. For instance, one of the main features of convolutional neural networks [1] is the election of convolution as the operator of choice to act on the data. The convolutional kernels learned by optimising a loss function are operators that map an image to a new one that, for instance, is more easily classifiable. Moreover, convolutions are operators equivariant with respect to translations (at least in the ideal continuous case). We believe that the restriction to a specific family of operators and the equivariance with respect to interpretable transformations are key aspects of the success of this architecture. In our theory, operators are thought of as instruments allowing an agent to provide a measure of the world, as the kernels learned by a convolutional neural network allow a classifier to spot essential features to recognise objects belonging to the same category. Equivariance with respect to the action of a group (or a set) of transformations corresponds to the introduction of symmetries in the function space where data are represented. This allows us to both gain control on the nature of the learned operators, as well as drastically reduce the dimensionality of the space of operators to be explored during learning. Such a goal is in line with the recent interest for invariant representations in machine learning (cf., e.g., [2]).

We make use of topological data analysis to describe spaces of group equivariant non-expansive operators (GENEOs). GENEOs are maps between function spaces associated with groups of transformations. We study the topological and metric properties of the space of GENEOs to evaluate their approximating power and set the basis for general strategies to initialise, compose operators and eventually connect them hierarchically to form operator networks. Our first contribution is to define suitable pseudo-metrics for the function spaces, the equivariance groups, and the set of non-expansive operators. Basing on these pseudo-metrics, we prove that the space of GENEOs is compact and convex, under the assumption that the function spaces are compact and convex. These results provide fundamental and provable guarantees for the goodness of this operator-based approach in a machine learning perspective: Compactness, for instance, guarantees that any operator belonging to a certain space can be approximated by a finite number of operators sampled in the same space.

Our study of the space of GENEOs takes advantage of recent results in topological data analysis, in particular in the theory of persistent homology [3]. Our approach also generalises standard group equivariance to set equivariance, which seems much more suitable for the representation of intelligent agents.

To conclude, we validate our model with examples on the MNIST, fashion-MNIST and CIFAR10 datasets. These applications are aimed at proving the effectiveness on discrete examples, of the metrics defined and the theorems proved in the continuous case. By considering isometry equivariant non-expansive operators (IENEOs), we describe two simple algorithms allowing the selection and sampling of IENEOs based on few labelled samples taken from the dataset. We show how selected and sampled operators can be used to perform both classical metric learning and effective initialisation of the kernels of a convolutional neural network.

Our main contribution is a general framework to previous works on group equivariance in deep learning context [4, 5]. We believe that the formal foundation of our model is suitable to start a new theory of deep-learning engineering, and that novel research lines will stem from the synergy of machine learning and topology. This synergy is object of study by more and more researchers, focusing both on the treatment of data via TDA before applying classical machine learning [6, 7], and the analysis of the topology of convolutional neural networks [8]. However, our approach differs from the previous ones in that it focuses on a new theoretical setting, based on the introduction of new topologies and metrics.

The paper is structured as follows. In Section 2 the epistemological foundations of our model are discussed. The mathematical background in topological persistence is provided in Section 3. Section 4 details the mathematical model for data, transformations, and GENEOs. Section 5 proves the compactness and convexity of the space of GENEOs, under suitable hypotheses. New results in persistent homology to define computable metrics in the space of GENEOs and in the space of data are presented in Section 6, along with the extension of the theory from group to set equivariance. Finally, in Section 7, we describe two algorithms to select and sample operators in the discrete case, and show examples on the MNIST and fashion-MNIST datasets. A Python package allowing to reproduce the computational experiments described in Section 7 is available in gitlab.com/mattia.bergomi/geneos.

2 Epistemological setting

Our mathematical model is justified by an epistemological background which revolves around the following assumptions:

Data are represented as functions defined on topological spaces, since only data that are stable with respect to a certain criterion (e.g., with respect to some kind of measurement) can be considered for applications, and stability requires a topological structure. 2. 2.

Data cannot be studied in a direct and absolute way. They are only knowable through acts of transformation made by an agent. From the point of view of data analysis, only the pair (data, agent) matters. In general terms, agents are not endowed with purposes or goals: they are just ways and methods to transform data. Acts of measurement are a particular class of acts of transformation. 3. 3.

Agents are described by the way they transform data while respecting some kind of invariance. In other words, any agent can be seen as a group equivariant operator acting on a function space. 4. 4.

Data similarity depends on the output of the considered agent.

In other words, in our framework we assume that the analysis of data is replaced by the analysis of the pair (data, agent). Since an agent can be seen as a group equivariant operator, from the mathematical viewpoint our purpose consists in presenting a good topological theory of suitable operators of this kind, representing agents. For more details, we refer the interested reader to [9].

3 Mathematical background

Our mathematical model builds on functional analysis and Topological Data Analysis (TDA). TDA is an emerging field of research which studies topological approaches to explore and make sense of complex, high-dimensional data, such as artificial and biological networks [10, 11]. The basic idea is that topology can help to recognize patterns within data, and therefore to turn data into useful knowledge. One of the main concepts in TDA is Persistent Homology (PH), a mathematical tool that captures topological information at multiple scales. Our mathematical model proposes an integration between the theory of group actions and persistent homology.

In summary, persistent homology allows to represent the topological and geometrical features of a topological space $X$ (e.g. an image, a $3$ -dimensional mesh) as it is seen by a continuous, real-valued function $\varphi$ defined on the space. The homology functor (see for instance [12]) is used to encode the information of the pair $(X,\varphi)$ in the form of persistence diagrams. In other words, we can associate each continuous function $\varphi:X\to\mathbb{R}$ with a persistence diagram $D_{\varphi}$ , that is represented by a discrete collection of points in the real plane. Beyond the technicalities that are needed to define the concept of persistence diagram, two important points are to be stressed. First, persistence diagrams can be quickly computed. Second, an easily computable distance $\delta_{\mathrm{match}}$ between persistence diagrams is available and gives a lower bound for the max-norm distance between functions: $\delta_{\mathrm{match}}(D_{\varphi_{1}},D_{\varphi_{2}})\leq\|\varphi_{1}-\varphi_{2}\|_{\infty}$ . It follows that the bottleneck distance $\delta_{\mathrm{match}}$ between persistence diagrams can be used as an efficient proxy for the max-norm distance between real-valued functions. Since our approach is deeply rooted in the comparison of real-valued functions, persistence diagrams are a key tool in our model. The definition of persistence diagram and of the bottleneck distance $\delta_{\mathrm{match}}$ are intuitively depicted in fig. 1 and rapidly formalised in what follows. We refer the reader to [13, 14, 15] for further details.

3.1 Persistent Homology

In PH, data are modelled as objects in a metric space. The first step is to filter the data so to obtain a family of nested topological spaces that captures the topological information at multiple scales. A common way to obtain a filtration is by sublevel sets of a continuous function, hence the name sublevelset persistence. Let $\varphi$ be a real-valued continuous function on a topological space $X$ . Persistent homology represents the changes of the homology groups of the sub-level set $X_{t}=\varphi^{-1}((-\infty,t])$ varying $t$ in $\mathbb{R}$ . We can see the parameter $t$ as an increasing time, whose changes produce the birth and the death of $k$ -dimensional holes in the sub-level set $X_{t}$ . We observe that the number of independent 0-dimensional holes of $X_{t}$ equals the number of connected components of $X_{t}$ minus one, 1-dimensional holes refer to tunnels and 2-dimensional holes to voids.

Definition 3.1.

If $u,v\in\mathbb{R}$ and $u<v$ , we can consider the inclusion $i$ of $X_{u}$ into $X_{v}$ . If $\check{H}$ denotes the Čech homology functor, such an inclusion induces a homomorphism $i_{k}:\check{H}_{k}(X_{u})\rightarrow\check{H}_{k}(X_{v})$ between the homology groups of $X_{u}$ and $X_{v}$ in degree $k$ . The group $PH_{k}^{\varphi}(u,v):=i_{k}(\check{H}_{k}(X_{u}))$ is called the $k$ th persistent homology group with respect to the function $\varphi:X\rightarrow\mathbb{R}$ , computed at the point $(u,v)$ . The rank $r_{k}(\varphi)(u,v)$ of $PH_{k}^{\varphi}(u,v)$ is said the $k$ th persistent Betti numbers function (PBN) with respect to the function $\varphi:X\rightarrow\mathbb{R}$ , computed at the point $(u,v)$ .

Persistent Betti numbers functions can be completely described by multisets called persistence diagrams. The $k$ th persistence diagram is the multiset of all the pairs $p_{j}=(b_{j},d_{j})$ , where $b_{j}$ and $d_{j}$ are the times of birth and death of the $j$ th $k$ -dimensional hole, respectively. When a hole never dies, we set its time to death equal to $\infty$ . The multiplicity $m(p_{j})$ says how many holes share both the time of birth $b_{j}$ and the time of death $d_{j}$ . For technical reasons, the points $(t,t)$ on the diagonal are added to each persistence diagram, each one with infinite multiplicity.

Each persistence diagram $D$ can contain an infinite number of points. For every $q\in\Delta^{*}:=\{(x,y)\in\mathbb{R}^{2}\ :\ x<y\}\cup\{(x,\infty):x\in\mathbb{R}\}$ , the equality $m(q)=0$ means that $q$ does not belong to the persistence diagram $D$ . We define on $\bar{\Delta}^{*}:=\{(x,y)\in\mathbb{R}^{2}\ :\ x\leq y\}\cup\{(x,\infty):x\in\mathbb{R}\}$ a pseudo-metric as follows

[TABLE]

by agreeing that $\infty-y=\infty,\ y-\infty=-\infty$ for $y\neq\infty,\ \infty-\infty=0,\ \frac{\infty}{2}=\infty,\ |\pm\infty|=\infty,\ \min\{\infty,c\}=c,\ \max\{\infty,c\}=\infty$ .

The pseudo-metric $d^{*}$ between two points $p$ and $p^{\prime}$ takes the smaller value between the cost of moving $p$ to $p^{\prime}$ and the cost of moving $p^{\prime}$ and $p$ onto $\Delta:=\{(x,y)\in\mathbb{R}^{2}\ :\ x=y\}$ . Obviously, $d^{*}(p,p^{\prime})=0$ for every $p,p^{\prime}\in\Delta$ . If $p\in\Delta^{+}:=\{(x,y)\in\mathbb{R}^{2}\ :\ x<y\}$ and $p^{\prime}\in\Delta$ , then $d^{*}(p,p^{\prime})$ equals the distance, induced by the max-norm, between $p$ and $\Delta$ . Points at infinity have a finite distance only to the other points at infinity, and their distance equals the Euclidean distance between abscissas.

We can compare persistence diagrams by means of the bottleneck distance (also called matching distance) $\delta_{\mathrm{match}}$ .

Definition 3.2.

Let $D,\ D^{\prime}$ be two persistence diagrams. We define the bottleneck distance $\delta_{\mathrm{match}}$ between $D$ and $D^{\prime}$ by setting

[TABLE]

where $\sigma$ varies in the set of all bijections from the multiset $D$ to the multiset $D^{\prime}$ .

For further informations about persistence diagrams and the bottleneck distance, we refer the reader to [15, 16]. Each persistent Betti numbers function is associated with exactly one persistence diagram, and (if we use Čech homology) every persistence diagram is associated with exactly one persistent Betti numbers function. Then the metric $\delta_{\mathrm{match}}$ induces a pseudo-metric $d_{\mathrm{match}}$ on the sets of the persistent Betti numbers functions [17].

4 Mathematical model

In our mathematical model, data are represented as function spaces, that is, as sets of real-valued functions on some topological space (Subsection 4.1). Function spaces come with invariance groups representing the transformations on data which are admissible for some agent (Subsection 4.2). The groups of transformations are specific to different agents, and can be either learned or part of prior knowledge. The operators on data are then defined as group equivariant non-expansive operators (GENEOs) (Subsection 4.3).

4.1 Data representation

Let us consider a set $X\neq\emptyset$ and a topological subspace $\varPhi$ of the set of all bounded functions $\varphi$ from $X$ to $\mathbb{R}$ , denoted by $\mathbb{R}^{X}_{b}$ and endowed with the topology induced by the distance

[TABLE]

If $\varPhi$ is compact, then it is also bounded, i.e., there exists a non-negative real value $L$ , such that $\|\varphi\|_{\infty}\leq L$ for every $\varphi\in\varPhi$ . We can think of $X$ as the space where one makes measurements, and of $\varPhi$ as the set of admissible measurements, called set of admissible functions. In other words, $\varPhi$ is the set of functions from $X$ to $\mathbb{R}$ that can be produced by measuring instruments. For example, an image can be represented as a function $\varphi$ from the real plane $X$ to the real numbers.

To quantify the distance between two points $x_{1},x_{2}\in X$ , we compare the values taken at $x_{1}$ and $x_{2}$ by the admissible functions in $\varPhi$ . Therefore, we endow $X$ with the extended pseudo-metric222We recall that a pseudo-metric is just a distance $d$ without the property that $d(a,b)=0$ implies $a=b$ . An extended pseudo-metric is a pseudo-metric that may take the value $\infty$ . If $\varPhi$ is bounded, then $D_{X}$ is a pseudo-metric. $D_{X}$ defined by setting

[TABLE]

for every $x_{1},x_{2}\in X$ (see Appendix A).

The assumption behind the definition of $D_{X}$ is that two points can be distinguished only if they assume different values for some admissible function. As an example, if $\varPhi$ contains only constant functions, no discrimination can be made between points in $X$ and hence $D_{X}(x_{1},x_{2})$ vanishes for every $x_{1},x_{2}\in X$ .

The pseudo-metric space $(X,D_{X})$ can be considered as a topological space by choosing as a base $\mathcal{B}_{D_{X}}$ the collection of all the sets

[TABLE]

where $\varepsilon>0$ and $x\in X$ (see [18]).

The reason to endow the measurement space $X$ with a topology, rather than considering just a set, follows from the need of formalizing the assumption that data are stable. To formalize stability we have to use a topology (or a pseudo-metric inducing a topology).

It is interesting to stress the link between the topology $\tau_{D_{X}}$ associated with $D_{X}$ and the initial topology333We recall that $\tau_{\mathrm{in}}$ is the coarsest topology on $X$ such that each function $\varphi\in\varPhi$ is continuous. Explicitly, the open sets in $\tau_{\mathrm{in}}$ are the sets that can be obtained as unions of finite intersections of sets $\varphi^{-1}(U)$ , where $\varphi\in\varPhi$ and $U\in\mathcal{T}_{E}$ . In other words, a base $\mathcal{B}_{\mathrm{in}}$ of $\tau_{\mathrm{in}}$ is given by the collection of all sets that can be represented as $\bigcap_{i\in I}\varphi^{-1}_{i}(U_{i})$ , where $I$ is a finite set of indexes and $\varphi_{i}\in\varPhi$ , $U_{i}\in\mathcal{T}_{E}$ for every $i\in I$ [18]. $\tau_{\mathrm{in}}$ on $X$ with respect to $\varPhi$ , when we take the Euclidean topology $\mathcal{T}_{E}$ on $\mathbb{R}$ .

Theorem 4.1.

The topology $\tau_{D_{X}}$ on $X$ induced by the pseudo-metric $D_{X}$ is finer than the initial topology $\tau_{\mathrm{in}}$ on $X$ with respect to $\varPhi$ . If $\varPhi$ is totally bounded, then the topology $\tau_{D_{X}}$ coincides with $\tau_{\mathrm{in}}$ .

(The proof is in Appendix C.)

Since $\tau_{\mathrm{in}}$ is the coarsest topology on $X$ such that $\varphi\in\varPhi$ is continuous, Theorem 4.1 guarantees that the assumption that the functions are continuous is not restrictive in practice, for example while dealing with images, which often contain discontinuities. Indeed, our functions are not required to be continuous with respect to other topologies (e.g., the Euclidean topology $\tau_{E}$ on $X=\mathbb{R}^{2}$ ).

In general $X$ is not compact with respect to the topology $\tau_{D_{X}}$ , even if $\varPhi$ is compact. For example, if $X$ is the open interval $]0,1[$ and $\varPhi$ contains only the identity from $]0,1[$ to $]0,1[$ , the topology induced by $D_{X}$ is simply the Euclidean topology and hence $X$ is not compact. However, the next result holds.

Theorem 4.2.

If $\varPhi$ is compact and $X$ is complete then $X$ is also compact.

(The proof is in Appendix C.)

4.1.1 A remark on the use of pseudo-metrics

The reader could think better to change the pseudo-metric $D_{X}$ into a metric $D^{\prime}$ by quotienting out $X$ by the equivalence relation $x_{1}Rx_{2}\iff D_{X}(x_{1},x_{2})=0$ and defining $D^{\prime}([x_{1}],[x_{2}])=D_{X}(x_{1},x_{2})$ for any $[x_{1}],[x_{2}]\in X/R$ . The reason we do not do this is that several different sets of admissible measurements can be considered on the same set $X$ . For two different sets $\varPhi_{1}$ , $\varPhi_{2}$ of admissible functions, we obtain two different quotient spaces $X/R_{1}$ , $X/R_{2}$ . If we forget about the original space $X$ , we lose the possibility of linking the equivalence classes in $X/R_{1}$ with the ones in $X/R_{2}$ . On the contrary, we prefer to preserve the identity of points in $X$ , studying how they link to each other when we change the set $\varPhi$ . This observation leads us to work with pseudo-metrics instead of metrics.

Before proceeding, we observe that the map $\pi$ taking each point $x\in X$ to the equivalence class $[x]\in X/R$ is continuous with respect to $D_{X}$ and $D^{\prime}$ , and surjective. Moreover, $\pi$ takes each ball with respect to $D_{X}$ to a ball with respect to $D^{\prime}$ , while the inverse image under $\pi$ of each ball with respect to $D^{\prime}$ is a ball with respect to $D_{X}$ . It follows that if a subset $S\subseteq X$ is compact (sequentially compact) for $D_{X}$ then $\pi(S)$ is compact (sequentially compact) for $D^{\prime}$ , and that if a subset $\mathcal{S}\subseteq X/R$ is compact (sequentially compact) for $D^{\prime}$ then $\pi^{-1}(\mathcal{S})$ is compact (sequentially compact) for $D_{X}$ . Finally, given a sequence $(x_{i})$ in $X$ , we observe that $(x_{i})$ converges to $\bar{x}$ in $X$ if and only if the sequence $([x_{i}])$ converges to $[\bar{x}]$ in $X/R$ . These facts imply that the development of our theory in terms of pseudo-metrics is not far from the analysis in terms of metrics.

4.2 Transformations on data

In our model, we assume that data are transformed through maps from $X$ to $X$ which are $\varPhi$ -preserving homeomorphisms with respect to the pseudo-metric $D_{X}$ . Let $\mathrm{Homeo}(X)$ denote the set of homeomorphisms from $X$ to $X$ with respect to $D_{X}$ , and $\mathrm{Homeo}_{\varPhi}(X)$ denote the set of $\varPhi$ -preserving homeomorphisms, namely the homeomorphisms $g\in\mathrm{Homeo}(X)$ such that $\varphi\circ g\in\varPhi$ and $\varphi\circ g^{-1}\in\varPhi$ for every $\varphi\in\varPhi$ .

The following Proposition 4.3 implies that $\mathrm{Homeo}_{\varPhi}(X)$ is exactly the set of all bijections $g:X\to X$ such that $\varphi\circ g\in\varPhi$ and $\varphi\circ g^{-1}\in\varPhi$ for every $\varphi\in\varPhi$ .

Proposition 4.3.

If $g$ is a bijection from $X$ to $X$ such that $\varphi\circ g\in\varPhi$ and $\varphi\circ g^{-1}\in\varPhi$ for every $\varphi\in\varPhi$ , then $g$ is an isometry444The definition of isometry between pseudo-metric spaces can be considered as a special case of isometry between metric spaces. Let $(X_{1},d_{1})$ and $(X_{2},d_{2})$ be two pseudo-metric spaces. It is easy to check that if $f:X_{1}\longrightarrow X_{2}$ is a function verifying the equality $d_{1}(x,y)=d_{2}(f(x),f(y))$ for every $x,y\in X_{1}$ , then $f$ is continuous with respect to the topologies induced by $d_{1}$ and $d_{2}$ . If $f$ verifies the previous equality and is bijective, we say that it is an isometry between the considered pseudo-metric spaces. If $f$ is an isometry, we can trivially observe that $f^{-1}$ is also an isometry, and that $f$ is a homeomorphism. (and hence a homeomorphism) with respect to $D_{X}$ .

(The proof is in Appendix C.)

*Remark 4.4**.*

In general, $\mathrm{Homeo}(X)\neq\mathrm{Homeo}_{\varPhi}(X)$ . As an example, take $X=[0,1]$ and $\varPhi=\{\mathrm{Id}\}$ . In this case $D_{X}(x_{1},x_{2})=|x_{1}-x_{2}|$ and $\mathrm{Homeo}_{\varPhi}=\{\mathrm{Id}\}$ , while $\mathrm{Homeo}$ is the set of all homeomorphisms from the interval $[0,1]$ to itself with respect to the Euclidean distance.

*Remark 4.5**.*

For each $g\in\mathrm{Homeo}_{\varPhi}(X)$ , we consider the bijective map $R_{g}:\varPhi\longrightarrow\varPhi$ defined by setting $R_{g}(\varphi)=\varphi\circ g$ for every $\varphi\in\varPhi$ . We claim that $R_{g}$ preserves the pseudo-distance $D_{\varPhi}$ defined by Equality (3). Indeed, if $\varphi,\varphi^{\prime}\in\varPhi$ and $g\in G$ then

[TABLE]

because $g$ is a bijection. Since $R_{g}$ is a bijection preserving $D_{\varPhi}$ , then $R_{g}$ is an isometry with respect to $D_{\varPhi}$ .

In the rest of this paper we will assume that $\varPhi$ is compact with respect to the topology induced by $D_{\varPhi}$ , and that $X$ is complete (and hence compact) with respect to the topology induced by $D_{X}$ .

Let us now consider a subgroup $G$ of the group $\mathrm{Homeo}_{\varPhi}(X)$ . $G$ represents the set of transformations on data for which we require equivariance to be respected.

We can define the pseudo-distance $D_{G}$ on $G$ :

[TABLE]

from $G\times G$ to $\mathbb{R}$ (see Appendix A).

The rationale in the definition of $D_{G}$ is that in our model every comparison must be based on the max-norm distance between admissible acts of measurement. As a consequence, we define the distance between two homeomorphisms by the difference of their actions on the set $\varPhi$ of possible measurements.

*Remark 4.6**.*

$D_{G}$ can be expressed as:

[TABLE]

We can now state the following theorems:

Theorem 4.7.

$G$ * is a topological group with respect to the pseudo-metric topology and the action of $G$ on $\varPhi$ through right composition is continuous.*

(The proof is in Appendix C.)

Theorem 4.8.

If $G$ is complete then it is also compact with respect to $D_{G}$ .

(The proof is in Appendix C.)

From now on we will suppose that $G$ is complete (and hence compact) with respect to the topology induced by $D_{G}$ .

4.2.1 The natural pseudo-distance $d_{G}$

We define the natural pseudo-distance $d_{G}$ on the space $\varPhi$ [3]. The natural pseudo-distance $d_{G}$ represents the ground truth in our model. It is based on comparing functions, and vanishes for pairs of functions that are equivalent with respect to the action of our group of homeomorphisms $G$ , which expresses the equivalences between data.

Definition 4.9.

The pseudo-distance $d_{G}:\varPhi\times\varPhi\rightarrow\mathbb{R}$ is defined by setting

[TABLE]

It is called the natural pseudo-distance associated with the group $G$ acting on $\varPhi$ .

If $G=\{\mathrm{Id}:x\mapsto x\}$ , then $d_{G}$ equals the sup-norm distance $D_{\varPhi}$ on $\varPhi$ . If $G_{1}$ and $G_{2}$ are subgroups of $\mathrm{Homeo}_{\varPhi}(X)$ and $G_{1}\subseteq G_{2}$ , then the definition of $d_{G}$ implies that

[TABLE]

for every $\varphi_{1},\ \varphi_{2}\in\varPhi$ .

Though $d_{G}$ represents the ground truth for data similarity in our model, unfortunately it is difficult to compute. This is also a consequence of the fact that we can easily find subgroups $G$ of $Homeo(X)$ that cannot be approximated with arbitrary precision by smaller finite subgroups of $G$ (e.g., when $G$ is the group of rigid motions of $X=\mathbb{R}^{3}$ ).

In the following sections, we show how $d_{G}$ can be approximated with arbitrary precision by means of a dual approach based on group equivariant non-expansive operators (GENEOs) and persistent homology.

4.2.2 A remark on the use of homeomorphisms

The reader could criticize the choice of grounding our approach on the concept of homeomorphism. After all, most of the objects that are considered for purposes of shape comparison “are not homeomorphic”. Therefore, the definition of natural pseudo-distance could seem not to be sufficiently flexible, since it does not allow to compare non-homeomorphic objects. Though, it is important to note that the space $X$ we use in our model does not represent the objects, but the space where one takes measurements about the objects. As such, $X$ is unique. For example, two images are considered as functions from the real plane $X$ to the real numbers, independently of the topological properties of the 3D objects represented in the images. If we make two CAT scans, the topological space $X$ is always given by an helix turning many times around a body, and no requirement is made about the topology of such a body. In other words, the topological space $X$ is determined only by the measuring instrument and not by the single object instances.

4.3 Group Equivariant Non-Expansive Operators

Under the assumptions made in the previous sections, the pair $(\varPhi,G)$ is called a perception pair.

Let us now assume that two perception pairs $(\varPhi,G)$ , $(\varPsi,H)$ are given together with a fixed homomorphism $T:G\to H$ . Each function $F:\varPhi\to\varPsi$ such that $F(\varphi\circ g)=F(\varphi)\circ T(g)$ for every $\varphi\in\varPhi,g\in G$ is said to be a perception map from $(\varPhi,G)$ to $(\varPsi,H)$ associated with the homomorphism $T$ . More briefly, we will also say that $F$ is a group equivariant operator. If $T$ is equal to the identity homomorphism $I:G\longrightarrow G$ , we can say that $F$ is a $G$ -map. We observe that the functions in $\varPhi$ and the functions in $\varPsi$ are defined on spaces that are generally different from each other.

*Remark 4.10**.*

Each perception pair $(\varPhi,G)$ can be seen as a category, whose objects are the functions in $\varPhi$ and the morphisms between two functions $\varphi_{1},\varphi_{2}\in\varPhi$ are the elements $g\in G$ such that $\varphi_{2}=\varphi_{1}\circ g$ . As usual, if $\varphi_{2}=\varphi_{1}\circ g$ and $\varphi^{\prime}_{2}=\varphi^{\prime}_{1}\circ g$ we wish to distinguish $g$ as a morphism between $\varphi_{1}$ and $\varphi_{2}$ from $g$ as a morphism between $\varphi^{\prime}_{1}$ and $\varphi^{\prime}_{2}$ , so we make different copies $g_{(\varphi_{1},\varphi_{2})}$ , $g_{(\varphi^{\prime}_{1},\varphi^{\prime}_{2})}$ of the homeomorphism $g$ by labelling it. As natural, $g^{\prime}_{(\varphi_{2},\varphi_{3})}\circ g_{(\varphi_{1},\varphi_{2})}=(g\circ g^{\prime})_{(\varphi_{1},\varphi_{3})}$ . A precise formalization of this procedure can be done in terms of slice categories. For more details we refer the reader to Appendix B.

When two perception pairs $(\varPhi,G)$ , $(\varPsi,H)$ are considered as categories and a homomorphism $T:G\to H$ is fixed, each perception map $F$ from $(\varPhi,G)$ to $(\varPsi,H)$ is naturally associated with a functor between the two categories, taking each function $\varphi\in\varPhi$ to $F(\varphi)\in\varPsi$ and each morphism $g_{(\varphi_{1},\varphi_{2})}\in G$ to the morphism $T(g)_{\left(F(\varphi_{1}),F(\varphi_{2})\right)}\in H$ .

Definition 4.11.

Assume that $(\varPhi,G)$ , $(\varPsi,H)$ are two perception pairs and that a homomorphism $T:G\to H$ has been fixed. If $F$ is a perception map from $(\varPhi,G)$ to $(\varPsi,H)$ with respect to $T$ and $F$ is non-expansive (i.e., $D_{\varPsi}\left(F(\varphi_{1}),F(\varphi_{2})\right)\leq D_{\varPhi}\left(\varphi_{1},\varphi_{2}\right)$ for every $\varphi_{1},\varphi_{2}\in\varPhi$ ), then $F$ is called a Group Equivariant Non-Expansive Operator (GENEO) associated with $T:G\to H$ .

Example 4.12.

As a reference for the reader, we give the following basic example of GENEO. Let $\varPhi$ be the set containing all $1$ -Lipschitz functions from $X=S^{2}=\{(x,y,z)\in\mathbb{R}^{3}:x^{2}+y^{2}+z^{2}=1\}$ to $[0,1]$ , and $G$ be the group of all rotations of $S^{2}$ around the $z$ -axis. Let $\varPsi$ be the set containing all $1$ -Lipschitz functions from $Y=S^{1}=\{(x,y)\in\mathbb{R}^{2}:x^{2}+y^{2}=1\}$ to $[0,1]$ , and $H$ be the group of all rotations of $S^{1}$ . We observe that $(\varPhi,G)$ and $(\varPsi,H)$ are two perception pairs. Now, let us consider the map $F:\varPhi\to\varPsi$ taking each function $\varphi\in\varPhi$ to the function $\psi\in\varPsi$ defined by setting $\psi(\theta):=\frac{1}{\pi}\int_{-\pi/2}^{\pi/2}\varphi(\theta,\alpha)\ d\alpha$ (with $\theta,\alpha$ polar coordinates), and the homomorphism $T$ taking the rotation of $S^{2}$ of $\alpha$ radians around the $z$ -axis positively oriented to the counter-clock rotation of $\alpha$ radians of $S^{1}$ . We can easily check that $F$ is a perception map and a GENEO from $(\varPhi,G)$ to $(\varPsi,H)$ , associated with the homomorphism $T$ . In this example $F$ and $T$ are surjective, but an example where $F$ and $T$ are not surjective can be easily found, e.g. by restricting $\varPhi$ to the singleton $\bar{\varPhi}$ containing only the null function and $G$ to the trivial group $\bar{G}$ containing only the identical homomorphism.

We can study how GENEOs act on the natural pseudo-distances:

Proposition 4.13.

If $F$ is a GENEO from $(\varPhi,G)$ to $(\varPsi,H)$ associated with $T:G\to H$ , then it is a contraction with respect to the natural pseudo-distances $d_{G}$ , $d_{H}$ .

(The proof is in Appendix C.)

4.3.1 Pseudo-metrics on $\mathrm{GENEO}\left((\varPhi,G),(\varPsi,H)\right)$

Let us denote by $\mathrm{GENEO}\left((\varPhi,G),(\varPsi,H)\right)$ the set of all GENEOs between two perception pairs $(\varPhi,G)$ , $(\varPsi,H)$ associated with $T:G\to H$ . We can endow this set with the following pseudo-distances $D_{\mathrm{GENEO}}$ , $D_{\mathrm{GENEO,H}}$ .

Definition 4.14.

If $F_{1},F_{2}\in\mathrm{GENEO}\left((\varPhi,G),(\varPsi,H)\right)$ , we set

[TABLE]

The next result can be easily proved by applying the inequality $d_{H}\leq D_{\Psi}$ (see Theorem 6.1) and recalling that the supremum of a family of bounded pseudo-metrics is still a pseudo-metric.

Proposition 4.15.

$D_{\mathrm{GENEO}}$ * and $D_{\mathrm{GENEO,H}}$ are pseudo-metrics on $\mathrm{GENEO}\left((\varPhi,G),(\varPsi,H)\right)$ . Moreover, $D_{\mathrm{GENEO,H}}\leq D_{\mathrm{GENEO}}$ .*

It would be easy to check that as a matter of fact $D_{\mathrm{GENEO}}$ is a metric.

This simple statement holds:

Proposition 4.16.

For every $F\in\mathrm{GENEO}\left((\varPhi,G),(\varPsi,H)\right)$ and every $\varphi\in\varPhi$ : $\|F(\varphi)\|_{\infty}\leq\|\varphi\|_{\infty}+\|F(\textbf{0})\|_{\infty}$ , where 0 denotes the function taking the value 0 everywhere.

(The proof is in Appendix C.)

5 On the compactness and convexity of the space of GENEOs

In this section we show that, if the function spaces are compact and convex, then the space of GENEOs is compact and convex too. This property has important consequences from the computational point of view, since it guarantees that the space of GENEOs can be approximated by a finite set and that new GENEOs can be obtained by convex combination of preexisting GENEOs.

Several results in this section and in Section 6 mimic the corresponding results in [3], where the particular case $(\varPhi,G)=(\varPsi,H)$ , $T=\mathrm{Id}:G\to H$ is considered. Note that considering different function spaces and different groups of equivariance is fundamental, as it allows one to compose operators hierarchically, in the same fashion as computational units are linked in an artificial neural network.

For the sake of conciseness, in the following we will set $\mathcal{F}^{\mathrm{all}}:=\mathrm{GENEO}\left((\varPhi,G),(\varPsi,H)\right)$ . We recall that we are assuming $\varPhi$ and $\Psi$ compact with respect to $D_{\varPhi}$ and $D_{\Psi}$ , respectively.

5.1 The space of GENEOs is compact with respect to $D_{\mathrm{GENEO}}$

Theorem 5.1.

$\mathcal{F}^{\mathrm{all}}$ * is compact with respect to $D_{\mathrm{GENEO}}$ .*

(The proof is in Appendix C.)

5.2 The set of GENEOs is convex

Let $F_{1},F_{2},\dots,F_{n}$ be GENEOs from $(\varPhi,G)$ to $(\varPsi,H)$ associated with the homomorphism $T$ . Let $(a_{1},a_{2},\dots,a_{n})\in\mathbb{R}^{n}$ with $\sum_{i=1}^{n}|a_{i}|\leq 1$ . Consider the function

[TABLE]

from $\varPhi$ to the set $C^{0}(Y,\mathbb{R})$ of the continuous functions from $Y$ to $\mathbb{R}$ , where $Y$ is the domain of the functions in $\Psi$ .

Proposition 5.2.

If $F_{\Sigma}(\varPhi)\subseteq\varPsi$ , then $F_{\Sigma}$ is a GENEO from $(\varPhi,G)$ to $(\varPsi,H)$ with respect to $T$ .

(The proof is in Appendix C.)

Theorem 5.3.

If $\Psi$ is convex, then the set of GENEOs from $(\varPhi,G)$ to $(\varPsi,H)$ with respect to $T$ is convex.

(The proof is in Appendix C.)

5.3 GENEOs as agents in our model

In our model the agents are represented by GENEOs. Indeed, each agent can be seen as a black box that receives and transforms data. If a nonempty subset $\mathcal{F}$ of $\mathrm{GENEO}\left((\varPhi,G),(\varPsi,H)\right)$ is fixed, a simple pseudo-distance $D_{\mathcal{F},\varPhi}(\varphi_{1},\varphi_{2})$ to compare two admissible functions $\varphi_{1},\varphi_{2}\in\varPhi$ can be defined by setting $D_{\mathcal{F},\varPhi}(\varphi_{1},\varphi_{2}):=\sup_{F\in\mathcal{F}}\|F(\varphi_{1})-F(\varphi_{2})\|_{\infty}$ . This definition expresses our assumption that the comparison of data strongly depends on the choice of the agents. However, we note that the computation of $D_{\mathcal{F},\varPhi}(\varphi_{1},\varphi_{2})$ for every pair $(\varphi_{1},\varphi_{2})$ of admissible functions is computationally expensive. In the next section, we will see how persistent homology allows us to replace $D_{\mathcal{F},\varPhi}$ with a pseudo-metric $\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}$ that is quicker to compute, while still being stable and strongly invariant.

6 A strongly group-invariant pseudo-metric induced by Persistent Homology

In this section, we show how Persistent Homology supports the definition of a strongly group invariant pseudo-metric on $\varPhi$ , for which we prove some theoretical results.

We begin by recalling the stability of the classical pseudo-distance $d_{\mathrm{match}}$ between persistent Betti numbers functions (BPNs) (cf. Definition 3.2) with respect to the pseudo-metrics $D_{\varPhi}$ and $d_{\mathrm{Homeo}(X)}$ . We assume the finiteness of PBNs 555Though in our setting, the space $X$ is assumed to be compact, PBNs are not necessarily finite. For example, let us consider the set $X=\{0\}\cup\{\frac{1}{n},\ \text{with}\ n\in\mathbb{N}\}$ and $\varPhi=\{\mathrm{Id}:X\longrightarrow X\}$ . Even if $X$ is compact, every sublevel set $X_{u}=\{x\in X:x\leq u\}$ with $u>0$ has infinite connected components, and hence the [math]th persistent Betti numbers function takes infinite value at every point $(u,v)$ with $0<u<v$ . We add the assumption on the finiteness of PBNs (i.e., the assumption that the persistent Betti numbers function of every $\varphi\in\varPhi$ takes a finite value at each point $(u,v)\in\Delta^{+}$ ) to get stability and discard pathological cases (for example the case that the set $\varPhi$ of admissible functions is the set of all maps from $X$ to $\mathbb{R}$ ). Since the PBNs of the pseudo-metric space $(X,D_{X})$ coincide with the persistent Betti numbers functions of its Kolmogorov quotient $\bar{X}$ , the finiteness of the persistent Betti numbers functions can be obtained when $\bar{X}$ is finitely triangulable (cf. [17]).. Then, the stability of $d_{match}$ with respect to $D_{\varPhi}$ easily follows from the stability theorem of the interleaving distance and the isometry theorem (cf. [19]).

Theorem 6.1.

If k is a natural number, $G_{1}\subseteq G_{2}\subseteq\mathrm{Homeo}_{\varPhi}(X)$ and $\varphi_{1},\varphi_{2}\in\varPhi$ , then

[TABLE]

The proof of the first inequality $d_{\mathrm{match}}(r_{k}(\varphi_{1}),r_{k}(\varphi_{2}))\leq d_{\mathrm{Homeo}(X)}(\varphi_{1},\varphi_{2})$ in Theorem 6.1 is based on the stability of $d_{match}$ with respect to $D_{\varPhi}$ and can be found in [17]. The other inequalities follow from the definition of the natural pseudo-distance.

6.1 Strongly group invariant comparison of filtering functions via persistent homology

Let us consider a subset $\mathcal{F}\neq\emptyset$ of $\mathcal{F}^{\mathrm{all}}$ . For every fixed $k$ , we can consider the following pseudo-metric $\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}$ on $\varPhi$ :

[TABLE]

for every $\varphi_{1}\varphi_{2}\in\varPhi$ , where $r_{k}(\varphi)$ denotes the $k$ th persistent Betti numbers function with respect to the function $\varphi:X\rightarrow\mathbb{R}$ .

In this work, we will say that a pseudo-metric $\hat{d}$ on $\varPhi$ is strongly G-invariant if it is invariant under the action of $G$ with respect to each variable, that is, if $\hat{d}(\varphi_{1},\varphi_{2})=\hat{d}(\varphi_{1}\circ g,\varphi_{2})=\hat{d}(\varphi_{1},\varphi_{2}\circ g)=\hat{d}(\varphi_{1}\circ g,\varphi_{2}\circ g)$ for every $\varphi_{1},\varphi_{2}\in\varPhi$ and every $g\in G$ .

*Remark 6.2**.*

It is easily seen that the natural pseudo-distance $d_{G}$ is strongly $G$ -invariant.

Proposition 6.3.

$\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}$ * is a strongly $G$ -invariant pseudo-metric on $\varPhi$ .*

(The proof is in Appendix C.)

6.2 Some theoretical results on the pseudo-metric $\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}$

At first we want to show that the pseudo-metric $\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}$ is stable with respect to both the natural pseudo-distance $d_{G}$ associated with the group $G$ and the distance $D_{\varPhi}$ .

*Remark 6.4**.*

Let $X$ and $Y$ be two homeomorphic spaces and let $h:Y\rightarrow X$ be a homeomorphism. Then the persistent homology group with respect to the function $\varphi:X\rightarrow\mathbb{R}$ and the persistent homology group with respect to the function $\varphi\circ h:Y\rightarrow\mathbb{R}$ are isomorphic at each point $(u,v)$ in the domain. Therefore we can say that the persistent homology groups and the persistent Betti numbers functions are invariant under the action of $\mathrm{Homeo}(X)$ .

Theorem 6.5.

If $\mathcal{F}$ is a non-empty subset of $\mathcal{F}^{\mathrm{all}}$ , then

[TABLE]

(The proof is in Appendix C.)

The definitions of the natural pseudo-distance $d_{G}$ and the pseudo-distance $\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}$ come from different theoretical concepts. The former is based on a variation approach involving the set of all homeomorphisms in $G$ , while the latter refers only to a comparison of persistent homologies depending on a family of group equivariant non-expansive operators. Given those comments, the next result may appear unexpected.

Theorem 6.6.

Let us assume that $\varPhi=\Psi$ , every function in $\varPhi$ is non-negative, the $k$ -th Betti number of $X$ does not vanish, and $\varPhi$ contains each constant function $c$ for which a function $\varphi\in\varPhi$ exists such that $0\leq c\leq\|\varphi\|_{\infty}$ . Then $\mathcal{D}^{\mathcal{F}^{\mathrm{all}},k}_{\mathrm{match}}=d_{G}$ .

(The proof is in Appendix C.)

We observe that if $\varPhi$ is bounded, the assumption that every function in $\varPhi$ is non-negative is not quite restrictive. Indeed, we can obtain it by adding a suitable constant value to every admissible function.

6.3 Pseudo-metrics induced by persistent homology

Persistent homology can be seen as a topological method to build new and easily computable pseudo-metrics for the sets $\varPhi$ , $G$ and $\mathcal{F}^{\mathrm{all}}$ . These new pseudo-metrics $\Delta_{\varPhi}$ , $\Delta_{G}$ , $\Delta_{\mathrm{GENEO}}$ can be used as proxies for $d_{G}$ (and hence $D_{\varPhi}$ ), $D_{G}$ , $D_{\mathrm{GENEO}}$ , respectively:

If $\varphi_{1},\varphi_{2}\in\varPhi$ , we can set $\Delta_{\varPhi}(\varphi_{1},\varphi_{2}):=d_{\mathrm{match}}(r_{k}(\varphi_{1}),r_{k}(\varphi_{2}))$ . The stability theorem for persistence diagrams (Theorem 6.1) can be reformulated as the inequalities $\Delta_{\varPhi}\leq d_{G}\leq D_{\varPhi}$ .

2.

If $g_{1},g_{2}\in G$ , we can set $\Delta_{G}(g_{1},g_{2}):=\sup_{\varphi\in\varPhi}d_{\mathrm{match}}(r_{k}(\varphi\circ g_{1}),r_{k}(\varphi\circ g_{2}))$ . From Theorem 6.1 the inequality $\Delta_{G}\leq D_{G}$ follows.

3.

If $F_{1},F_{2}\in\mathcal{F}^{\mathrm{all}}$ , we can set $\Delta_{\mathrm{GENEO}}\left(F_{1},F_{2}\right):=\sup_{\varphi\in\varPhi}d_{\mathrm{match}}(r_{k}(F_{1}(\varphi)),r_{k}(F_{2}(\varphi)))$ . From Theorem 6.1 the inequalities $\Delta_{\mathrm{GENEO}}\leq D_{\mathrm{GENEO,H}}\leq D_{\mathrm{GENEO}}$ follow.

In particular, $\Delta_{\varPhi}$ and a discretized version of the pseudo-metric $\Delta_{\mathrm{GENEO}}$ will be used in the experiments described in Section 7. We underline that the use of persistent homology is a key tool in our approach: it allows for a fast comparison between functions and between GENEOs. Without persistent homology, this comparison would be much more computationally expensive.

6.4 Approximating ${\mathcal{D}}^{\mathcal{F},k}_{\mathrm{match}}$

The next result will be of use for the approximation of $\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}$ .

Proposition 6.7.

Let $\mathcal{F},\mathcal{F}^{\prime}\subseteq\mathcal{F}^{\mathrm{all}}$ . If the Hausdorff distance

[TABLE]

is not larger than $\varepsilon$ , then

[TABLE]

for every $\varphi_{1},\varphi_{2}\in\varPhi$ .

(The proof is in Appendix C.)

Since the compactness of the space $\mathcal{F}^{\mathrm{all}}$ guarantees we can cover $\mathcal{F}$ by a finite set of balls in $\mathcal{F}^{\mathrm{all}}$ of radius $\varepsilon$ , centered at points of a finite set $\mathcal{F}^{\prime}\subseteq\mathcal{F}$ , the following proposition states that the approximation of $\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}(\varphi_{1},\varphi_{2})$ can be reduced to the computation of $\mathcal{D}^{\mathcal{F}^{\prime},k}_{\mathrm{match}}(\varphi_{1},\varphi_{2})$ , i.e. the maximum of a finite set of bottleneck distances between persistence diagrams, which are well-known to be computable by means of efficient algorithms.

Proposition 6.8.

Let $\mathcal{F}$ be a non-empty subset of $\mathcal{F}^{\mathrm{all}}$ . For every $\varepsilon>0$ , a finite subset $\mathcal{F}^{*}$ of $\mathcal{F}$ exists, such that

[TABLE]

for every $\varphi_{1},\varphi_{2}\in\varPhi$ .

(The proof is in Appendix C.)

*Remark 6.9**.*

Theorem 5.1 and the inequalities $\Delta_{\mathrm{GENEO}}\leq D_{\mathrm{GENEO,H}}\leq D_{\mathrm{GENEO}}$ stated in Subsection 6.3 immediately imply that $\mathcal{F}^{\mathrm{all}}$ is compact also with respect to the topologies induced by $\Delta_{\mathrm{GENEO}}$ and $D_{\mathrm{\mathrm{GENEO},H}}$ .

6.5 Beyond group equivariance

We observe that while the definition of the natural pseudo-distance $d_{G}$ requires that $G$ has the structure of a group, the definition of $\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}$ does not need this assumption. In other words, our approach based on GENEOs can be used also when we wish to have equivariance with respect to a set instead of a group of homeomorphisms. This property is promising for extending the application of our theory to the cases in which the agent is equivariant with respect to each element of a finite set of homeomorphisms that is not closed with respect to composition and computation of the inverse.

7 Validation on discrete function spaces

In summary, we introduced above a theoretical framework allowing to describe an agent acting on data as a collection of suitable operators. We do that by representing data as points of a space of continuous functions with compact support. The density of such space makes the quest for suitable operators for the approximation of a given agent computationally complicated. For this reason, we chose to consider GENEOs: enforcing equivariance with respect to the action of a group causes the dimensionality of the search-space to collapse. Furthermore, in Section 4, we showed how GENEO spaces can be equipped with suitable metrics and respect properties that are essential in a machine learning context. The results concerning compactness and convexity make it possible to safely explore the space of GENEOs when operating on a labelled dataset. One of the main issue to be addressed when working in the proposed setting is the computability of metrics between operators. In Section 6 we show how metrics between GENEOs can be lower approximated via persistent homology. These results should be enough to guarantee approximability, efficacy and computability of GENEOs, when utilised to solve supervised tasks.

Our mathematical model and theorems are based on the assumption that data can be treated as points in a space of continuous functions. In this section, we test the validity of such results on classification of real-world datasets proceeding as follows. First we describe an algorithm allowing to select and sample GENEOs in order to learn the metric induced on a dataset by a labelling function. After that, we define the class of GENEOs we will use to study the MNIST, fashion-MNIST and CIFAR10 datasets. Selection and sampling are then used to approximate an agent able to express the underlying metric of these datasets by observing only $20$ or $40$ examples per class. Thereafter, we show how the metric learned through selection and sampling is still expressive when used to represent distances among validation samples transformed according to the equivariances of the GENEOs of choice. Finally, we use selected and sampled GENEOs to inject knowledge in an artificial neural network.

7.1 Operators selection and sampling on labelled datasets

We start from the assumption that data labelled with the same symbol share common features with respect to the agent we want to approximate. Thus, we suggest an algorithm for metric learning based on the metrics introduced on the space of GENEOs in Section 6. Briefly, we start by selecting randomly a certain number of GENEOs. Afterwards, we compare them by taking advantage of the fact that their representation as persistence diagrams is invariant with respect to the action of $G$ . These selected operators see those features that are common among the samples associated to the same label. Finally, always profiting from the property of the matching distance to be lower bound of the metric defined on the space of operators, we sample the operators in order to obtain a minimal set of non-redundant operators.

In symbols, let $\Phi=\left\{\varphi_{1},\dots,\varphi_{n}\right\}$ be a dataset equipped with a labelling function $l:\Phi\rightarrow I\in\mathbb{N}$ . We assume that the dataset can be written as the disjoint union $\Phi=\sqcup_{i\in I}\Phi_{i}$ where $\Phi_{i}$ contains samples labelled by $i$ . Let $\mathcal{F}$ be the space of operators that will act on the samples. We begin by randomly sampling $N$ candidate operators in $\mathcal{F}$ , let us denote them as the set $\mathcal{C}=\left\{F_{k}\right\}_{k\in\{1,\dots,N\}}$ . We then select those operators that consider as similar the objects belonging to the same class. Let us consider the samples in $\Phi_{l}$ , for each of the candidate operators $F\in\mathcal{C}$ , we define the label-dependent value

[TABLE]

A candidate operator $F$ is selected if $s_{l}\left(F\right)$ is smaller than a fixed threshold $\epsilon$ for every $l$ . Let us denote by $\mathcal{S}$ the set of selected operators. In practice, we will show how few examples per class are enough to select operators able to grasp salient topological-geometrical features from the example samples, and can be consequently used to compute reasonable distances between new validation samples.

The selection criterion does not guarantee that the operators are maximally diverse, when evaluated within and in-between classes. The important advantage of working on metric spaces is that we can now sample the elements of $\mathcal{S}$ to avoid storing operators that would focus on the same or similar characteristic. To this end, given a class $l$ , we define the distance between two operators $F_{p}$ and $F_{q}$ (cf. Subsection 6.3)

[TABLE]

For every label $l$ , we sort the pairs $\left(F_{p},F_{q}\right)$ in ascending order of $\Delta^{l}_{\mathrm{GENEO}}$ , and assign to each pair of operators its index in the sorted list of distances. We then define the interclass contrastive score of the pair $\left(F_{p},F_{q}\right)$ as the sum of its indices over all classes. Finally, we remove from $\mathcal{S}$ redundant operators, i.e. we select only one operators for pairs whose score is below a fixed threshold $t$ .

Finally, two objects $\varphi_{1}$ and $\varphi_{2}$ can be compared by computing the strongly $G$ -invariant pseudo-metric $\mathcal{D}^{\mathcal{S}}_{\mathrm{match}}(\varphi_{1},\varphi_{2})$ , defined in Section 6.

7.2 Isometry equivariant non-expansive operators

One of the main strength of convolutional neural networks is the natural equivariance of the convolution operator with respect to the group of planar translations. However, oftentimes when working with images or volumes, invariance with respect to other transformations such as rotations or reflexions can be important. In what follows we define a parametric family of non-expansive operators which are equivariant with respect to Euclidean plane isometries.

Given $\sigma>0$ and $\tau\in\mathbb{R}$ , we consider the $1$ -dimensional Gaussian function with width $\sigma$ and centre $\tau$

[TABLE]

where $g_{\tau}:\mathbb{R}\to\mathbb{R}$ . For a positive integer $k$ , we take the set $S$ of the $2k$ -tuples $(a_{1},\tau_{1},\ldots,a_{k},\tau_{k})\in\mathbb{R}^{2k}$ for which $\sum_{i=1}^{k}a_{i}^{2}=\sum_{i=1}^{k}\tau_{i}^{2}=1$ . $S$ is a submanifold of $\mathbb{R}^{2k}$ .

For each $p=(a_{1},\tau_{1},\ldots,a_{k},\tau_{k})\in S$ , we then consider the function $G_{p}:\mathbb{R}^{2}\to\mathbb{R}$ defined as

[TABLE]

If we denote by $F_{p}$ the convolutional operator mapping each continuous function with compact support $\varphi:\mathbb{R}^{2}\to\mathbb{R}$ to the continuous and with compactly supported function $\psi:\mathbb{R}^{2}\to\mathbb{R}$ defined as

[TABLE]

Then, the operator $F_{p}$ is a group equivariant non-expansive operator with respect to the group $I$ of Euclidean plane isometries. We call $F_{p}$ a IENEO (Isometry Equivariant Non-Expansive Operator).

The IENEO $F_{p}$ is parametric with respect to the $2k$ -tuple $p=(a_{1},\tau_{1},\ldots,a_{k},\tau_{k})\in S$ . Therefore, we define a parametric family of IENEOs $\mathcal{F}=\{F_{p}\}_{p\in S}$ .

7.3 Applications

We are now ready to utilise the selection and sampling strategy to find operators able to recognise samples belonging to the same class in a discrete dataset. We propose three different applications of our model. First we select and sample operators on two-classes subsets of the MNIST, fashion-MNIST and CIFAR10 datasets, we evaluate the validity of the learned metric by computing pairwise distances of validation samples according to the selected and sampled operators. Let us denote these operators by $\mathcal{S}$ . Second, we evaluate on the MNIST dataset the capacity of the operators in $\mathcal{S}$ to discriminate validation examples that have been transformed with random planar isometries. Finally, we use $\mathcal{S}$ to initialise the filters of a convolutional layer and a dense architecture to classify the samples belonging to the classes the IENEOs where selected and sampled on.

7.3.1 Image preprocessing

Images are preprocessed according to the pipeline described in the first column of Figure 2. Every image $I$ is first reshaped to size $(128,128)$ , then blurred with a $3\times 3$ Gaussian kernel and finally standardised as $I_{s}=\frac{I-\textrm{mean}(I)}{\textrm{std}(I)}$ . The same preprocessing is applied in all experiments and to all datasets.

7.3.2 Metric learning through selection and sampling

Metric learning is a natural application in the framework we describe. Indeed, operators that have been selected on labelled examples should be able to grasp geometrical and topological features that are shared among the examples belonging to the same class. Afterwards, selected and sampled operators $F_{i}\in\mathcal{S}$ can be used to measure distances between pairs of validation samples as

[TABLE]

This choice implies that two samples $\varphi$ and $\varphi^{\prime}$ will have distance [math], and hence are considered the same by the collection of selected operators (agent), only if every operator in $\mathcal{S}$ sees them as identical. Note also that $d_{\mathcal{S}}$ is invariant with respect to the action of the group of planar isometries. This invariance is naturally inherited by the usage of $d_{\textrm{match}}$ . After computing the pairwise distance between validation examples, we use hierarchical clustering [22] to visualise how samples have been organised by the metric as a dendrogram.

For every dataset $\Phi\in\{\textrm{MNIST},\textrm{fashion-MNIST},\textrm{CIFAR10}\}$ , we select a subset $\Phi_{l_{i},l_{j}}$ of samples belonging to two classes. We start by randomly initialising a parametrised family of IENEOs (of cardinality $500$ or $750$ in the experiments that follow). Afterwards, a small number–typically $20$ or $40$ –of samples per class are randomly chosen. These samples are then used to select common within-class geometrical and topological features by selection and sampling. The threshold for the selection algorithm is set to $\tau=1.5$ and the threshold $t$ for sampling is defined as the $75th$ percentile of all contrastive scores. These parameters are fixed and used in all the following experiments.

We first studied the efficacy of selection and sampling on a binary classification task on the MNIST dataset. After selecting samples belonging to two randomly selected classes of MNIST, we chose $20$ random samples per class to be used as examples in the selection and sampling algorithm. Sampled and selected IENEOs are then used to compute the pairwise distances of $10$ validation samples per class and generate the dendrogram in panel B of Figure 4. We reproduced three times the same experiment by varying the size and the number of $1$ -dimensional Gaussians used to initialise the IENEOs. In particular, we considered sizes $s\in\{7,11,21\}$ . The number of Gaussians was chosen according to the size as $\frac{s}{2}+1$ and rounded to the nearest integer. The dendrograms resulting from this manipulation are depicted in panels B, C, D in Figure 4.

Successively, we applied the same strategy and parameters to the fashion-MNIST and CIFAR10 datasets, obtaining the results in Figure 5.

7.3.3 Validation on augmented samples

This application aims at testing the aforementioned equivariance of the distance $d_{\mathcal{S}}$ defined in Equation 18. To do this, we consider a set of operators selected and sampled on non-transformed samples, while we transform the set of validation samples by applying a random transformation among translations, rotations and reflections parametrised as follows:

rotations are selected randomly to be between $1$ and $30$ degrees; 2. 2.

translations can be in both the $x$ and $y$ -axis directions in a range between $1$ and $2$ pixels; 3. 3.

reflections are computed randomly with respect to one of the two axes.

The transformed samples along with the dendrograms obtained by considering the metric induced by the selected and sampled operators are shown in Figure 6.

7.3.4 Knowledge injection

As a final application, we discuss the possibility of using selected and sampled operators $\mathcal{S}$ as fixed feature extractor for a simple artificial neural network model. We do that by using the elements of $\mathcal{S}$ to initialise non-trainable filters of a convolutional layer. On top of this layer, we use two fully-connected layers, the first with ReLu [23] and the latter softmax activations, two classify samples from pairs of classes of MNIST, fashion-MNIST and CIFAR10 datasets. Then we compare the performance of the classifier operating with the IENEO-initialised filters, with an identical architecture whose filters were initialised randomly with Glorot initilisation [24]. The architecture of the model and the performance are shown in Figure 7.

8 Discussion and conclusions

The first contribution of this paper consists in giving a novel, formal and sound mathematical framework for machine learning, based on the study of metric and topological properties of operator spaces acting on function spaces. This approach is dual to the classical one: instead of focusing on data, our approach focuses on suitable operators defined on the functions that represent the data. Of all possible type of operators, we study the space of non-expansive, group equivariant operators (GENEOs). When building a machine learning system, choosing to work on a space of operators equivariant with respect to specific transformations allows us to inject in the system pre-existing knowledge. Indeed, the operators will be blind to the action of the group on the data, hence reducing the dimensionality of the space to be explored during optimisation. The choice of working with non-expansive operators is justified both by the possibility of proving the compactness of the spaces of GENEOs (under the assumption of compactness of the spaces of measurements), and by the fact that in practical applications we are usually interested in operators that compress the information we have as an input. The rationale of our approach is based on the assumption that the main interest in machine learning does not consist in the analysis and the approximation of data, but in the analysis and the approximation of the observers looking at the data. A simple example can make this idea clearer: if we consider images representing skin lesions, we are not mainly interested in the images per se but rather in approximating the judgement given by the physicians about such images.

Presenting our mathematical model, we first show how the space of GENEOs is suitable for machine learning. By using pseudo-metrics, we define a topology on the space of GENEOs which is induced by the one we define on the function space of data. We build the necessary machinery to define maps between GENEOs whose groups of equivariance are different. This definition is fundamental, because it allows one to compose operators hierarchically, in the same fashion as computational units are linked in an artificial neural network. Thereafter, by taking advantage of known and novel results in persistent homology, we prove compactness and convexity of the space of GENEOs under suitable hypotheses. Moreover and importantly, we show how the suggested framework can be used to study operators that are equivariant with respect to set of transformations, rather than groups. In particular, we observe that the pseudo-metric $\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}$ defined in Subsection 6.1 can be used also in the case that the operators in $\mathcal{F}$ are equivariant with respect to a set instead of a group of homeomorphisms. This possibility appears to be promising for future research. It is important to stress the use of persistent homology in our model: the metric comparison of GENEOs is a key point in our approach and persistent homology allows for a fast comparison of functions, so allowing for a fast comparison of GENEOs.

We give two algorithms that allow to select and sample from a space of operators given a dataset labelled for a classification task. These procedures allow to first select a subset of operators belonging to a certain GENEOs space, that give meaningful representation of the data with respect to their labelling, always invariant under the transformations induced by the action of $G$ . Thenceforth, the sampling algorithm allows to eliminate redundant operators. These two strategies are used to perform metric learning and kernel on MNIST and fashion-MNIST. In addition, we show how convolutional filters initialised by selecting and sampling on few samples effectively grasp useful knowledge, that can be utilised to classify the remainder of the samples, for instance by a dense classifier.

Our forward-looking goal is the one of defining a novel artificial neural network model based on functional modules. Modules would be more complex computational units than the standard artificial neuron. The core of each module would be a GENEO, thus each module would be defined a priori to be equivariant with respect to a set of transformations. On one hand, this choice would allow us to dramatically reduce the dimensionality of the manifold to be studied during optimisation. On the other hand, choosing the transformation equivariances to be respected at each layer would allow us to inject knowledge in the networks before training, and would assure that information is not acquired by relying on unwanted noisy regularities in the training data. Module networks would learn optimal transformations of the data to achieve a task, rather than operating on data themselves.

Module networks could be built by composing modules hierarchically and knowledge could be injected in the model by engineering the proper set of equivariances. These transformations would be easily interpretable and could offer a rigorous way to compare learning dynamics of different architectures during optimisation. In particular, we are investigating the possibility to generalize capsule networks [25, 26] and modify the dynamic routing algorithm, by using the metrics on the space of GENEOs to update the connectivity strength between modules.

We conclude by observing that several interesting problems and new lines of research naturally arise in our mathematical model. First of all some sets of GENEOs appear to have a structure of a Lie group and a Riemannian manifold: these structures seem worth study and analysis. Secondly, new methods for building GENEOs should be developed, in order to get good approximations of the spaces of GENEOs for given equivariance groups and function spaces. We plan to devote further research to these issues.

Appendix A Additional propositions

Proposition A.1.

The function $D_{X}$ is an extended pseudo-metric on $X$ .

*Remark A.2**.*

We recall that a pseudo-metric is just a distance $d$ without the property: if $d(a,b)=0$ , then $a=b$ .

Proof.

$D_{X}$ is obviously symmetrical.

2.

The definition of $D_{X}$ immediately implies that $D_{X}(x,x)=0$ for any $x\in X$ .

3.

The triangle inequality holds, since

[TABLE]

for any $x_{1},x_{2},x_{3}\in X$ .

∎

Proposition A.3.

If $\varPhi$ is totally bounded, then for any $\delta>0$ there exists a finite subset $\varPhi_{\delta}$ of $\varPhi$ such that

[TABLE]

for every $x_{1},x_{2}\in X$ .

Proof.

Let us fix $x_{1},x_{2}\in X.$ Since $\varPhi$ is totally bounded, we can find a finite subset $\varPhi_{\delta}=\{\varphi_{1},\dots,\varphi_{n}\}$ such that for each $\varphi\in\varPhi$ there exists $\varphi_{i}\in\varPhi_{\delta}$ , for which $\|\varphi-\varphi_{i}\|_{\infty}<\delta$ . It follows that for any $x\in X,\ |\varphi(x)-\varphi_{i}(x)|<\delta$ . Because of the definition of supremum of a subset of the set $\mathbb{R}^{+}$ of all positive real numbers, for any $\varepsilon>0$ we can choose a $\bar{\varphi}\in\varPhi$ such that

[TABLE]

Now, if we take an index $i$ , for which $\|\bar{\varphi}-\varphi_{i}\|_{\infty}<\delta$ , we have that:

[TABLE]

Hence,

[TABLE]

Finally, as $\varepsilon$ goes to zero, we have that

[TABLE]

On the other hand, since $\varPhi_{\delta}\subseteq\varPhi$ :

[TABLE]

Therefore we proved the statement.

∎

Proposition A.4.

The function $D_{G}$ is a pseudo-metric on $G$ .

Proof.

The value $D_{G}(g_{1},g_{2})$ is finite for every $g_{1},g_{2}\in G$ , because $\varPhi$ is compact and hence bounded. Indeed, a finite constant $L$ exists such that $\left\|\varphi\right\|_{\infty}\leq L$ for every $\varphi\in\varPhi$ . Hence, $\left\|\varphi\circ g_{1}-\varphi\circ g_{2}\right\|_{\infty}\leq\left\|\varphi\right\|_{\infty}+\left\|\varphi\right\|_{\infty}\leq 2L$ for any $\varphi\in\varPhi$ and any $g_{1},g_{2}\in G$ , since $\varphi\circ g_{1},\varphi\circ g_{2}\in\varPhi$ . This implies that $D_{G}(g_{1},g_{2})\leq 2L$ for every $g_{1},g_{2}\in G$ .

2.

$D_{G}$ is obviously symmetrical.

3.

The definition of $D_{G}$ immediately implies that $D_{G}(g,g)=0$ for any $g\in G$ .

4.

The triangle inequality holds, since

[TABLE]

for any $g_{1},g_{2},g_{3}\in G$ .

∎

Appendix B Our approach in terms of slice categories

In this section, we will apply the concept of slice category to our framework in order to formalize the concept of perception pairs, which are considered as subcategories of a larger category denoted by $\faktor{\text{PMet}}{(\mathbb{R},d_{e})}$ , as we explain further. Moreover we explore the link between GENEOs and functors between categories of this kind.

Let PMet be the category whose objects are pseudo-metric spaces and morphisms are the continuous functions between them. Let us fix the space $(\mathbb{R},d_{e})$ , that is the real line equipped with the usual Euclidean metric, and consider the slice category over $(\mathbb{R},d_{e})$ .

Now we recall the definition of slice category:

Definition B.1.

The slice category $\faktor{C}{c}$ of a category $C$ over an object $c\in C$ has

objects that are all arrows $f\in C$ such that $\text{cod}(f)=c$ ,

2.

morphisms that are all triples $g_{f,f^{\prime}}:=(f,g,f^{\prime})$ where $f:X\longrightarrow c$ and $f^{\prime}:X^{\prime}\longrightarrow c$ are two objects of $\faktor{C}{c}$ , $g:X\longrightarrow X^{\prime}$ is a morphism of $C$ such that $f=f^{\prime}\circ g$ ; $\text{dom}(g_{f,f^{\prime}})=f\text{\ and }\text{cod}(g_{f,f^{\prime}})=f^{\prime}$ .

The slice category is a special case of a comma category.

*Remark B.2**.*

There is a forgetful functor $U_{c}:\faktor{C}{c}\longrightarrow C$ which maps each object $f:X\longrightarrow c$ to its domain $X$ and each morphism $g_{f,f^{\prime}}$ between $f:X\longrightarrow c$ and $f^{\prime}:X^{\prime}\longrightarrow c$ to the morphism $g:X\longrightarrow X^{\prime}$ .

We are going to associate a perception pair $(\varPhi,G)$ with a subcategory $C(\varPhi,G)$ of $\faktor{\text{PMet}}{(\mathbb{R},d_{e})}$ defined as follows:

the objects of $C(\varPhi,G)$ are the elements of $\varPhi$ ;

2.

the arrows of $C(\varPhi,G)$ are the triples $(f,g,f\circ g)$ , where $f\in\varPhi$ and $g\in G$ .

We observe that the action of $G$ on $\varPhi$ ensures us that the arrow $(f,g,f\circ g)$ is well-defined for any $f\in\varPhi$ and any $g\in G$ .

Now we can define a “functorial” version of the concept of GENEO.

Definition B.3.

Let us consider two categories $C(\varPhi,G)$ and $C(\Psi,H)$ . A functor $F$ from $C(\varPhi,G)$ to $C(\Psi,H)$ is a $C$ -GENEO if:

$D_{\Psi}(F(\varphi),F(\varphi^{\prime}))\leq D_{\varPhi}(\varphi,\varphi^{\prime})$ for any $\varphi,\ \varphi^{\prime}\in\varPhi$ ;

2.

for any pair of morphisms $m,\ m^{\prime}\in\mathrm{hom}(C(\varPhi,G))$ such that $U_{\mathbb{R}}(m)=U_{\mathbb{R}}(m^{\prime})$ we have that $U_{\mathbb{R}}(F(m))=U_{\mathbb{R}}(F(m^{\prime})).$

GENEOs and $C$ -GENEOs share the non-expansivity condition. The proposition below shows that the second conditions respectively required in the definitions of GENEO and $C$ -GENEO correspond to each other in a suitable sense. We omit its trivial proof.

Proposition B.4.

Let $F$ be a functor from $C(\varPhi,G)$ to $C(\Psi,H)$ . The following conditions are equivalent:

there exists a group homomorphism $T:G\longrightarrow H$ such that $F(\varphi\circ g)=F(\varphi)\circ T(g)$ for any $\varphi\in\varPhi$ and any $g\in G$ ;

2.

for any pair of morphisms $m,\ m^{\prime}\in\mathrm{hom}(C(\varPhi,G))$ such that $U_{\mathbb{R}}(m)=U_{\mathbb{R}}(m^{\prime})$ we have that $U_{\mathbb{R}}(F(m))=U_{\mathbb{R}}(F(m^{\prime}))$ .

Appendix C Proofs

Theorem (4.1).

The topology $\tau_{D_{X}}$ on $X$ induced by the pseudo-metric $D_{X}$ is finer than the initial topology $\tau_{\mathrm{in}}$ on $X$ with respect to $\varPhi$ . If $\varPhi$ is totally bounded, then the topology $\tau_{D_{X}}$ coincides with $\tau_{\mathrm{in}}$ .

Proof.

We know that the set $\mathcal{B}_{D_{X}}=\left\{B_{X}(x,\varepsilon):x\in X,\varepsilon>0\right\}$ is a base for the topology $\tau_{D_{X}}$ and the set $\mathcal{B}_{\mathrm{in}}=\left\{\bigcap_{i\in I}\varphi^{-1}_{i}(U_{i}):|I|<\infty,U_{i}\in\mathcal{T}_{E}\ \forall i\in I\right\}$ is a base for the topology $\tau_{\mathrm{in}}$ .

First of all we have to show that the topology $\tau_{D_{X}}$ is finer than the initial topology $\tau_{\mathrm{in}}$ . Let us take a set in the base $\mathcal{B}_{\mathrm{in}}$ of $\tau_{\mathrm{in}}$ , i.e. a set $\bigcap_{i\in I}\varphi_{i}^{-1}(U_{i})$ , where $I$ is a finite set of indexes and $U_{i}\in\mathcal{T}_{E}$ for every index $i\in I$ . It will be sufficient to show that for every $y\in\bigcap_{i\in I}\varphi_{i}^{-1}(U_{i})$ a ball $B_{X}(y,\varepsilon)\in\mathcal{B}_{D_{X}}$ exists, such that $B_{X}(y,\varepsilon)\subseteq\bigcap_{i\in I}\varphi_{i}^{-1}(U_{i})$ . Since $y\in\bigcap_{i\in I}\varphi_{i}^{-1}(U_{i})$ , we have that $\varphi_{i}(y)\in U_{i}$ for every $i\in I$ . Therefore, for each $i\in I$ we can find an open interval $]a_{i},b_{i}[$ such that $\varphi_{i}(y)\in]a_{i},b_{i}[\subseteq U_{i}$ . Let us set $\varepsilon:=\min_{i\in I}\min\{\varphi_{i}(y)-a_{i},b_{i}-\varphi_{i}(y)\}$ , and observe that $\varepsilon>0$ . If $z\in B_{X}(y,\varepsilon)$ , then $|\varphi(y)-\varphi(z)|<\varepsilon$ for every $\varphi\in\varPhi$ , and in particular $|\varphi_{i}(y)-\varphi_{i}(z)|<\varepsilon$ for every $i\in I$ . Hence the definition of $\varepsilon$ immediately implies that $\varphi_{i}(z)\in]a_{i},b_{i}[$ for every $i\in I$ , so that $z\in\bigcap_{i\in I}\varphi_{i}^{-1}(]a_{i},b_{i}[)$ . It follows that $B_{X}(y,\varepsilon)\subseteq\bigcap_{i\in I}\varphi_{i}^{-1}(]a_{i},b_{i}[)\subseteq\bigcap_{i\in I}\varphi_{i}^{-1}(U_{i})$ . Therefore, $y\in B_{X}(y,\varepsilon)\subseteq\bigcap_{i\in I}\varphi_{i}^{-1}(U_{i})$ , and our first statement is proved.

If $\varPhi$ is totally bounded, Proposition A.3 in Appendix A guarantees that for every $\delta>0$ a finite subset $\varPhi_{\delta}$ of $\varPhi$ exists such that

[TABLE]

for every $x_{1},x_{2}\in X$ . Let us now set $B_{\delta}(x,r):=\left\{x^{\prime}\in X\Big{|}\max_{\varphi_{i}\in\varPhi_{\delta}}|\varphi_{i}(x)-\varphi_{i}(x^{\prime})|<r\right\}$ for every $x\in X$ and every $r>0$ . We have to prove that the initial topology $\tau_{\mathrm{in}}$ is finer than the topology $\tau_{D_{X}}$ . In order to do this, it will be sufficient to show that for every $y\in B_{X}(x,\varepsilon)\in\mathcal{B}_{D_{X}}$ a set $\bigcap_{i\in I}\varphi_{i}^{-1}(U_{i})\in\mathcal{B}_{\mathrm{in}}$ exists, such that $y\in\bigcap_{i\in I}\varphi_{i}^{-1}(U_{i})\subseteq B_{X}(x,\varepsilon)$ .

Let us choose a positive $\delta$ such that $2\delta<\varepsilon$ . Inequality (20) implies that $B_{\delta}(y,\varepsilon-2\delta)\subseteq B_{X}(y,\varepsilon)$ . We now set $U_{i}:=]\varphi_{i}(y)-\varepsilon+2\delta,\varphi_{i}(y)+\varepsilon-2\delta[$ for $i\in I$ . Obviously, $y\in\bigcap_{\varphi_{i}\in\varPhi_{\delta}}\varphi_{i}^{-1}(U_{i})$ . If $z\in\bigcap_{\varphi_{i}\in\varPhi_{\delta}}\varphi_{i}^{-1}(U_{i})$ , then $|\varphi_{i}(z)-\varphi_{i}(y)|<\varepsilon-2\delta$ for every $\varphi_{i}\in\varPhi_{\delta}$ . Hence, $z\in B_{\delta}(y,\varepsilon-2\delta)$ . It follows that $\bigcap_{\varphi_{i}\in\varPhi_{\delta}}\varphi_{i}^{-1}(U_{i})\subseteq B_{\delta}(y,\varepsilon-2\delta)$ . Therefore, $y\in\bigcap_{\varphi_{i}\in\varPhi_{\delta}}\varphi_{i}^{-1}(U_{i})\subseteq B_{X}(x,\varepsilon)$ because of the inclusion $B_{\delta}(y,\varepsilon-2\delta)\subseteq B_{X}(y,\varepsilon)$ . This means that $\tau_{\mathrm{in}}$ is finer than $\tau_{D_{X}}$ . Since we already know that $\tau_{D_{X}}$ is finer than $\tau_{\mathrm{in}}$ , it follows that $\tau_{D_{X}}$ coincides with $\tau_{\mathrm{in}}$ . ∎

*Remark**.*

The second statement of Theorem 4.1 becomes false if $\varPhi$ is not totally bounded. For example, assume $\varPhi$ equal to the set of all functions from $X=[0,1]$ to $\mathbb{R}$ that are continuous with respect to the Euclidean topologies on $[0,1]$ and $\mathbb{R}$ . Indeed, it is easy to check that in this case $\tau_{D_{X}}$ is the discrete topology, while the initial topology $\tau_{\mathrm{in}}$ is the Euclidean topology on $[0,1]$ .

*Remark**.*

The pseudo-metric space $(X,D_{X})$ may not be a $T_{0}$ -space. For example, this happens if $X$ is a space containing at least two points and $\varPhi$ is the set of all the constant functions from $X$ to $[0,1]$ .

Theorem (4.2).

If $\varPhi$ is compact and $X$ is complete then $X$ is also compact.

Proof.

First of all we want to prove that every sequence $(x_{i})$ in $X$ admits a Cauchy subsequence in $X$ . After that, the statement follows immediately because every Cauchy sequence in a complete space is convergent, so that $X$ is sequentially compact, and hence compact, since $X$ is a pseudo-metric space [18].

Let us consider an arbitrary sequence $(x_{i})$ in $X$ and an arbitrarily small $\varepsilon>0$ . Since $\varPhi$ is compact, we can find a finite subset $\varPhi_{\varepsilon}=\{\varphi_{1},\dots,\varphi_{n}\}$ such that $\varPhi=\bigcup_{i=1}^{n}B_{\varPhi}(\varphi_{i},\varepsilon)$ , where $B_{\varPhi}(\varphi,\varepsilon)=\{\varphi^{\prime}\in\varPhi:D_{\varPhi}(\varphi^{\prime},\varphi)<\varepsilon\}$ . In particular, we can say that for any $\varphi\in\varPhi$ there exists $\varphi_{\bar{k}}\in\varPhi_{\varepsilon}$ such that $\|\varphi-\varphi_{\bar{k}}\|_{\infty}<\varepsilon$ . Now, we consider the real sequence $\varphi_{1}(x_{i})$ that is bounded because all the functions in $\varPhi$ are bounded. From Bolzano-Weierstrass Theorem it follows that we can extract a convergent subsequence $\varphi_{1}(x_{i_{h}})$ . Then we consider the sequence $\varphi_{2}(x_{i_{h}})$ . Since $\varphi_{2}$ is bounded, we can extract a convergent subsequence $\varphi_{2}(x_{i_{h_{t}}})$ . We can repeat the same argument for any $\varphi_{k}\in\varPhi_{\varepsilon}$ . Thus, we obtain a subsequence $(x_{i_{j}})$ of $(x_{i})$ , such that $\varphi_{k}(x_{i_{j}})$ is a real convergent sequence for any $k\in\{1,\dots,n\}$ , and hence a Cauchy sequence in $\mathbb{R}$ . Moreover, since $\varPhi_{\varepsilon}$ is a finite set, there exists an index $\bar{\jmath}$ such that for any $k\in\{1,\dots,n\}$ we have that

[TABLE]

We observe that $\bar{\jmath}$ does not depend on $\varphi$ , but only on $\varepsilon$ and $\varPhi_{\varepsilon}$ .

In order to prove that $(x_{i_{j}})$ is a Cauchy sequence in $X$ , we observe that for any $r,s\in\mathbb{N}$ and any $\varphi\in\varPhi$ we have:

[TABLE]

It follows that $|\varphi(x_{i_{r}})-\varphi(x_{i_{s}})|<3\varepsilon$ for every $\varphi\in\varPhi$ and every $r,s\geq\bar{\jmath}$ . Thus, $\sup_{\varphi\in\varPhi}|\varphi(x_{i_{r}})-\varphi(x_{i_{s}})|=D_{X}(x_{i_{r}},x_{i_{s}})\leq 3\varepsilon$ . Hence, the sequence $(x_{i_{j}})$ is a Cauchy sequence in $X$ . The completeness of $X$ implies that the statement of Theorem 4.2 is true. ∎

Example.

Let $\varPhi$ be the set containing all the $1$ -Lipschitz functions from $X=\{(x,y)\in\mathbb{R}^{3}:x^{2}+y^{2}=1,\arcsin(x)\in\mathbb{Q}\}$ to $[0,1]$ , and $G$ be the group of all rotations $\rho_{2\pi q}$ of $2\pi q$ radians with $q\in\mathbb{Q}$ . The topological space $X$ is neither complete nor compact.

Proposition (4.3).

If $g$ is a bijection from $X$ to $X$ such that $\varphi\circ g\in\varPhi$ and $\varphi\circ g^{-1}\in\varPhi$ for every $\varphi\in\varPhi$ , then $g$ is an isometry (and hence a homeomorphism) with respect to $D_{X}$ .

Proof.

Let us fix two arbitrary points $x,x^{\prime}$ in $X$ . Obviously, the map $R_{g}:\varPhi\to\varPhi$ taking each function $\varphi$ to $\varphi\circ g$ is surjective, since $\varphi=R_{g}\left(R_{g^{-1}}(\varphi)\right)$ . Hence $R_{g}(\varPhi)=\varPhi$ . Therefore, $g$ preserves the pseudo-distance $D_{X}$ :

[TABLE]

Since $g$ is bijective, it follows that $g$ is an isometry with respect to $D_{X}$ . ∎

Theorem (4.7).

$G$ * is a topological group with respect to the pseudo-metric topology and the action of $G$ on $\varPhi$ through right composition is continuous.*

Proof.

It will suffice to prove that if $f=\lim_{i\to+\infty}f_{i}$ and $g=\lim_{i\to+\infty}g_{i}$ in $G$ with respect to the pseudo-metric $D_{G}$ , then $g\circ f=\lim_{i\to+\infty}g_{i}\circ f_{i}$ and $f^{-1}=\lim_{i\to+\infty}f_{i}^{-1}$ .

Because of the compactness of $\varPhi$ and Proposition A.3, for every $\delta>0$ we can take a finite subset $\varPhi_{\delta}$ of $\varPhi$ such that

[TABLE]

for every $x_{1},x_{2}\in X$ . We have that

[TABLE]

Since $g=\lim_{i\to+\infty}g_{i}$ , $\lim_{i\to+\infty}D_{G}(g_{i},g)=0$ . Because of Theorem 4.2, $X$ is compact and hence $\varphi\circ g:X\to\mathbb{R}$ is a uniformly continuous function. Since $f=\lim_{i\to+\infty}f_{i}$ , it follows that $\lim_{i\to+\infty}\sup_{x\in X}|\varphi(g(f_{i}(x))-\varphi(g(f(x))|=0$ for every $\varphi\in\varPhi_{\delta}$ , and hence $\lim_{i\to+\infty}\max_{\varphi\in\varPhi_{\delta}}\ \sup_{x\in X}|\varphi(g(f_{i}(x))-\varphi(g(f(x))|=0$ . Given that $\delta$ can be taken arbitrarily small, we get $g\circ f=\lim_{i\to+\infty}g_{i}\circ f_{i}$ .

We also want to prove that $f^{-1}=\lim_{i\to+\infty}f_{i}^{-1}$ . By contradiction, if we had not that $\lim_{i\to\infty}D_{G}(f_{i}^{-1},f^{-1})=0$ , then there would exist a constant $c>0$ and a subsequence $(f_{i_{j}})$ of $(f_{i})$ such that $D_{G}(f_{i_{j}}^{-1},f^{-1})\geq c>0$ for every index $j$ . However, we should still have $\lim_{j\to\infty}D_{G}(f_{i_{j}},f)=0$ because $(f_{i_{j}})$ is a subsequence of $(f_{i})$ . Since $D_{G}(f_{i_{j}}^{-1},f^{-1})\geq c>0$ for every index $j$ , a $\varphi_{j}\in\varPhi$ should exist such that $\|\varphi_{j}\circ f_{i_{j}}^{-1}-\varphi_{j}\circ f^{-1}\|_{\infty}\geq c>0$ .

Because of the compactness of $\varPhi$ , it would not be restrictive to assume (possibly by considering subsequences) the existence of the following limits: $\bar{\varphi}=\lim_{j\to\infty}\varphi_{j}$ and $\hat{\varphi}=\lim_{j\to\infty}\varphi_{j}\circ f_{i_{j}}^{-1}$ . We would have that

[TABLE]

so that $\hat{\varphi}\neq\bar{\varphi}\circ f^{-1}$ .

On the other hand, we should have

[TABLE]

so that $\hat{\varphi}\circ f=\bar{\varphi}$ .

It follows that $R_{f}$ is not injective, against our assumptions.

This contradiction proves that $\lim_{i\to\infty}f_{i}^{-1}=f^{-1}$ .

Therefore, $G$ is a topological group.

Let now $\varepsilon$ be a positive real number. If $D_{\varPhi}(\varphi,\psi),D_{G}(f,g)<\delta:=\varepsilon/2$ then

[TABLE]

This proves that the action of $G$ on $\varPhi$ through right composition is continuous. ∎

Theorem (4.8).

If $G$ is complete then it is also compact with respect to $D_{G}$ .

Proof.

We want to show that $G$ is sequentially compact, and hence compact. Let $(g_{i})$ be a sequence in $G$ and take a real number $\varepsilon>0$ . Given that $\varPhi$ is compact, we can find a finite subset $\varPhi_{\varepsilon}=\{\varphi_{1},\dots,\varphi_{n}\}$ such that for every $\varphi\in\varPhi$ there exists $\varphi_{h}\in\varPhi_{\varepsilon}$ for which $D_{\varPhi}(\varphi_{h},\varphi)<\varepsilon$ . For any fixed $k\in\{1,\dots,n\}$ , let us consider the sequence $(\varphi_{k}\circ g_{i})$ in $\varPhi$ . Applying the same argument as in the proof of Theorem 4.2, we can extract a subsequence $(g_{i_{j}})$ of $(g_{i})$ such that $(\varphi_{k}\circ g_{i_{j}})$ converges in $\varPhi$ with respect to $D_{\varPhi}$ and hence it is a Cauchy sequence for any $k\in\{1,\dots,n\}$ . For the finiteness of set $\varPhi_{\varepsilon}$ , we can find an index $\bar{\jmath}$ such that

[TABLE]

In order to prove that $(g_{i_{j}})$ is a Cauchy sequence, we observe that for any $\varphi\in\varPhi$ , any $\varphi_{k}\in\varPhi_{\varepsilon}$ , and any $r,s\in\mathbb{N}$ we have

[TABLE]

We observe that $\bar{\jmath}$ does not depend on $\varphi$ , but only on $\varepsilon$ and $\varPhi_{\varepsilon}$ . By choosing a $\varphi_{k}\in\varPhi_{\varepsilon}$ such that $D_{\varPhi}(\varphi_{k},\varphi)<\varepsilon$ , we get $D_{\varPhi}(\varphi\circ g_{i_{r}},\varphi\circ g_{i_{s}}))<\varepsilon$ for every $\varphi\in\varPhi$ and every $r,s\geq\bar{\jmath}$ . Thus, $D_{G}(g_{i_{r}},g_{i_{s}})<3\varepsilon$ . Hence, the sequence $(g_{i_{j}})$ is a Cauchy sequence. Finally, given that $G$ is complete, $(g_{i_{j}})$ is convergent. Therefore, $G$ is sequentially compact. ∎

Example.

Let $\varPhi$ be the set containing all the $1$ -Lipschitz functions from $X=S^{1}=\{(x,y)\in\mathbb{R}^{3}:x^{2}+y^{2}=1\}$ to $[0,1]$ , and $G$ be the group of all rotations $\rho_{2\pi q}$ of $X$ of $2\pi q$ radians with $q$ rational number. The space $(G,D_{G})$ is neither complete nor compact.

Proposition (4.13).

If $F$ is a GENEO from $(\varPhi,G)$ to $(\varPsi,H)$ associated with $T:G\to H$ , then it is a contraction with respect to the natural pseudo-distances $d_{G}$ , $d_{H}$ .

Proof.

Since $F$ is a GENEO, it follows that

[TABLE]

∎

Proposition (4.16).

For every $F\in\mathcal{F}^{\mathrm{all}}$ and every $\varphi\in\varPhi$ : $\|F(\varphi)\|_{\infty}\leq\|\varphi\|_{\infty}+\|F(\textbf{0})\|_{\infty}$ , where 0 denotes the function taking the value 0 everywhere.

Proof.

Since $F$ is non-expansive, we have that

[TABLE]

∎

Theorem (5.1).

$\mathcal{F}^{\mathrm{all}}$ * is compact with respect to $D_{\mathrm{GENEO}}$ .*

Proof.

We know that $(\mathcal{F}^{\mathrm{all}},D_{\mathrm{GENEO}})$ is a metric space. Therefore it will suffice to prove that $\mathcal{F}^{\mathrm{all}}$ is sequentially compact. In order to do this, let us assume that a sequence $(F_{i})$ in $\mathcal{F}^{\mathrm{all}}$ is given. Given that $\varPhi$ is a compact (and hence separable) metric space, we can find a countable and dense subset $\varPhi^{*}=\{\varphi_{j}\}_{j\in\mathbb{N}}$ of $\varPhi$ . By means of a diagonalization process, we can extract a subsequence $(F^{\prime}_{i})$ from $(F_{i})$ , such that for every fixed index $j$ the sequence $(F^{\prime}_{i}(\varphi_{j}))$ converges to a function in $\varPsi$ with respect to $D_{\Psi}$ . Now, let us consider the function $\bar{F}:\varPhi\rightarrow\ \varPsi$ defined by setting $\bar{F}(\varphi_{j}):=\lim_{i\to\infty}F^{\prime}_{i}(\varphi_{j})$ for each $\varphi_{j}\in\varPhi^{*}$ .

We extend $\bar{F}$ to $\varPhi$ as follows. For every $\varphi\in\varPhi$ we choose a sequence $(\varphi_{j_{r}})$ in $\varPhi^{*}$ , converging to $\varphi\in\varPhi$ , and set $\bar{F}(\varphi):=\lim_{r\to\infty}\bar{F}(\varphi_{j_{r}})$ . We claim that such a limit exists in $\varPsi$ and does not depend on the sequence that we have chosen, converging to $\varphi\in\varPhi$ . In order to prove that the previous limit exists, we observe that for every $r,\ s\in\mathbb{N}$

[TABLE]

because each $F^{\prime}_{i}$ is non-expansive.

Since the sequence $(\varphi_{j_{r}})$ converges to $\varphi\in\varPhi$ , it follows that $(\bar{F}(\varphi_{j_{r}}))$ is a Cauchy sequence with respect to $D_{\Psi}$ . The compactness of $\varPsi$ implies that $(\bar{F}(\varphi_{j_{r}}))$ converges in $\varPsi$ .

If another sequence $(\varphi_{k_{r}})$ in given in $\varPhi^{*}$ , converging to $\varphi\in\varPhi$ , then for every index $r\in\mathbb{N}$

[TABLE]

Since both $(\varphi_{j_{r}})$ and $(\varphi_{k_{r}})$ converge to $\varphi$ it follows that $\lim_{r\to\infty}\bar{F}(\varphi_{j_{r}})=\lim_{r\to\infty}\bar{F}(\varphi_{k_{r}})$ . Therefore the definition of $\bar{F}(\varphi)$ does not depend on the sequence $(\varphi_{j_{r}})$ that we have chosen, converging to $\varphi$ .

Now we have to prove that $\bar{F}\in\mathcal{F}^{\mathrm{all}}$ , i.e., that $\bar{F}$ verifies the properties defining this set of operators. We have already seen that $\bar{F}:\varPhi\rightarrow\ \varPsi$ .

For every $\varphi,\varphi^{\prime}$ we can consider two sequences $(\varphi_{j_{r}})$ , $(\varphi_{k_{r}})$ in $\varPhi^{*}$ , converging to $\varphi$ and $\varphi^{\prime}$ , respectively. Due to the fact that the operators $F^{\prime}_{i}$ are non-expansive, we have that

[TABLE]

Therefore, $\bar{F}:\varPhi\to\Psi$ is non-expansive. As a consequence, it is also continuous.

We can now prove that the sequence $(F^{\prime}_{i})$ converges to $\bar{F}$ with respect to $D_{\mathrm{GENEO}}$ .

Let us consider an arbitrarily small $\varepsilon>0$ . Since $\varPhi$ is compact and $\varPhi^{*}$ is dense in $\varPhi$ , we can find a finite subset $\{\varphi_{j_{1}},\dots,\varphi_{j_{n}}\}$ of $\varPhi^{*}$ such that for every $\varphi\in\varPhi$ , there exists an index $r\in\{1,\dots,n\}$ , for which $D_{\varPhi}\left(\varphi,\varphi_{j_{r}}\right)<\varepsilon$ .

Since the sequence $(F^{\prime}_{i})$ converges pointwise to $\bar{F}$ on the set $\varPhi^{*}$ , an index $\bar{\imath}$ exists, such that $D_{\Psi}\left(\bar{F}(\varphi_{j_{r}}),F^{\prime}_{i}(\varphi_{j_{r}})\right)<\varepsilon$ for any $i\geq\bar{\imath}$ and any $r\in\{1,\dots,n\}$ . Therefore, for every $\varphi\in\varPhi$ we can find an index $r\in\{1,\dots,n\}$ such that $D_{\varPhi}\left(\varphi,\varphi_{j_{r}}\right)<\varepsilon$ and the following inequalities hold for every index $i\geq\bar{\imath}$ , because of the non-expansivity of $\bar{F}$ and $F^{\prime}_{i}$ :

[TABLE]

We observe that $\bar{\imath}$ does not depend on $\varphi$ , but only on $\varepsilon$ and on the set $\{\varphi_{j_{1}},\dots,\varphi_{j_{n}}\}$ . It follows that $D_{\Psi}\left(\bar{F}(\varphi),F^{\prime}_{i}(\varphi)\right)<3\varepsilon$ for every $\varphi\in\varPhi$ and every $i\geq\bar{\imath}$ .

Hence, $\sup_{\varphi\in\varPhi}D_{\Psi}\left(\bar{F}(\varphi),F^{\prime}_{i}(\varphi)\right)\leq 3\varepsilon$ for every $i\geq\bar{\imath}$ . Therefore, the sequence $(F^{\prime}_{i})$ converges to $\bar{F}$ with respect to $D_{\mathrm{GENEO}}$ .

The last thing that we have to show is that $\bar{F}$ is group equivariant. Let us consider a $\varphi\in\varPhi$ , a sequence $(\varphi_{j_{r}})$ in $\varPhi^{*}$ converging to $\varphi$ in $\varPhi$ and a $g\in G$ . Obviously, $D_{\varPhi}(\varphi_{j_{r}}\circ g,\varphi\circ g)=D_{\varPhi}(\varphi_{j_{r}},\varphi)$ and hence the sequence $(\varphi_{j_{r}}\circ g)$ converges to $\varphi\circ g$ in $\varPhi$ with respect to $D_{\varPhi}$ . We recall that the right action of $G$ on $\varPhi$ is continuous, $\bar{F}$ is continuous and each $F^{\prime}_{i}$ is group equivariant. Hence, given that the sequence $(F^{\prime}_{i})$ converges to $\bar{F}$ with respect to $D_{\mathrm{GENEO}}$ , the following equalities hold:

[TABLE]

This proves that $\bar{F}$ is group equivariant, and hence a perception map. In conclusion, $\bar{F}$ is a GENEO. From the fact that the sequence $F^{\prime}_{i}$ converges to $\bar{F}$ with respect to $D_{\mathrm{GENEO}}$ , it follows that $(\mathcal{F}^{\mathrm{all}},D_{\mathrm{GENEO}})$ is sequentially compact. ∎

Proposition (5.2).

If $F_{\Sigma}(\varPhi)\subseteq\varPsi$ , then $F_{\Sigma}$ is a GENEO from $(\varPhi,G)$ to $(\varPsi,H)$ with respect to $T$ .

Proof.

First we prove that $F_{\Sigma}$ is a perception map with respect to $T$ . Since every $F_{i}$ is a perception map we have that:

[TABLE]

Since every $F_{i}$ is non-expansive, $F_{\Sigma}$ is non-expansive:

[TABLE]

Therefore $F_{\Sigma}$ is a GENEO. ∎

Theorem (5.3).

If $\Psi$ is convex, then the set of GENEOs from $(\varPhi,G)$ to $(\varPsi,H)$ with respect to $T$ is convex.

Proof.

It is sufficient to apply Proposition 5.2 for $n=2$ , by setting $a_{1}=t$ , $a_{2}=1-t$ for $0\leq t\leq 1$ , and observing that the convexity of $\Psi$ implies $F_{\Sigma}(\varPhi)\subseteq\varPsi$ . ∎

Proposition.

$\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}$ * is a strongly $G$ -invariant pseudo-metric on $\varPhi$ .*

Proof.

Theorem 6.1 and the non-expansivity of every $F\in\mathcal{F}$ imply that

[TABLE]

Therefore $\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}$ is a pseudo-metric, since it is the supremum of a family of pseudo-metrics that are bounded at each pair $(\varphi_{1},\varphi_{2})$ . Moreover, for every $\varphi_{1},\varphi_{2}\in\varPhi$ and every $g\in G$

[TABLE]

because of the equality $F(\varphi\circ g)=F(\varphi)\circ T(g)$ for every $\varphi\in\varPhi$ and every $g\in G$ and the invariance of persistent homology under the action of the homeomorphisms. Since the function $\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}$ is symmetric, this is sufficient to guarantee that $\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}$ is strongly $G$ -invariant. ∎

Theorem (6.5).

If $\mathcal{F}$ is a non-empty subset of $\mathcal{F}^{\mathrm{all}}$ , then

[TABLE]

Proof.

For every $F\in\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}$ , every $g\in G$ and every $\varphi_{1},\varphi_{2}\in\varPhi$ , we have that

[TABLE]

The first equality follows from the invariance of persistent homology under action of $\mathrm{Homeo}(X)$ (see Remark 6.4), and the second equality follows from the fact F is a group equivariant operator. The first inequality follows from the stability of persistent homology (Theorem 6.1), while the second inequality follows from the non-expansivity of $F$ . It follows that, if $\mathcal{F}\subseteq\mathcal{F}^{\mathrm{all}}$ , then for every $g\in G$ and every $\varphi_{1},\varphi_{2}\in\varPhi$

[TABLE]

Hence, the inequality $\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}\leq d_{G}$ follows, while $d_{G}\leq D_{\varPhi}$ is stated in Theorem 6.1. ∎

Theorem (6.6).

Let us assume that $\varPhi=\Psi$ , every function in $\varPhi$ is non-negative, the $k$ -th Betti number of $X$ does not vanish, and $\varPhi$ contains each constant function $c$ for which a function $\varphi\in\varPhi$ exists such that $0\leq c\leq\|\varphi\|_{\infty}$ . Then $\mathcal{D}^{\mathcal{F}^{\mathrm{all}},k}_{\mathrm{match}}=d_{G}$ .

Proof.

For every $\varphi^{\prime}\in\varPhi$ let us consider the operator $F_{\varphi^{\prime}}:\varPhi\to\varPhi$ defined by setting $F_{\varphi^{\prime}}(\varphi)$ equal to the constant function taking everywhere the value $d_{G}(\varphi,\varphi^{\prime})$ for every $\varphi\in\varPhi$ (i.e., $F_{\varphi^{\prime}}(\varphi)(x)=d_{G}(\varphi,\varphi^{\prime})$ for any $x\in X$ ). Our assumptions guarantee that such a constant function belongs to $\varPhi=\Psi$ . We also set $T=\mathrm{id}:G\to G$ .

We observe that

$F_{\varphi^{\prime}}$ is a group equivariant operator on $\varPhi$ , because the strong invariance of the natural pseudo-distance $d_{G}$ with respect to the group $G$ (Remark 6.2) implies that if $\varphi\in\varPhi$ and $g\in G$ , then $F_{\varphi^{\prime}}(\varphi\circ g)(x)=d_{G}(\varphi\circ g,\varphi^{\prime})=F_{\varphi^{\prime}}(\varphi)(g(x))=(F_{\varphi^{\prime}}(\varphi)\circ g)(x)=(F_{\varphi^{\prime}}(\varphi)\circ T(g))(x)$ , for every $x\in X$ . 2. 2.

$F_{\varphi^{\prime}}$ is non-expansive on $\varPhi$ , because for every $\varphi_{1},\varphi_{2}\in\varPhi$

[TABLE]

Therefore, $F_{\varphi^{\prime}}$ is a GENEO.

For every $\varphi_{1},\varphi_{2},\varphi^{\prime}\in\varPhi$ we have that

[TABLE]

Indeed, apart from the trivial points on the line $\{(u,v)\in\mathbb{R}^{2}\ :\ u=v\}$ , the persistence diagram associated with $r_{k}(F_{\varphi^{\prime}}(\varphi_{1}))$ contains only the point $(d_{G}(\varphi_{1},\varphi^{\prime}),\infty)$ , while the persistence diagram associated with $r_{k}(F_{\varphi^{\prime}}(\varphi_{2}))$ contains only the point $(d_{G}(\varphi_{2},\varphi^{\prime}),\infty)$ . Both the points have the same multiplicity, which equals the (non-null) $k$ -th Betti number of $X$ .

Setting $\varphi^{\prime}=\varphi_{2}$ , we have that

[TABLE]

As a consequence, we have that

[TABLE]

By applying Theorem 6.5, we get

[TABLE]

for every $\varphi_{1},\varphi_{2}$ . ∎

Proposition (6.7).

Let $\mathcal{F},\mathcal{F}^{\prime}\subseteq\mathcal{F}^{\mathrm{all}}$ . If the Hausdorff distance

[TABLE]

is not larger than $\varepsilon$ , then

[TABLE]

for every $\varphi_{1},\varphi_{2}\in\varPhi$ .

Proof.

Since $HD(\mathcal{F},\mathcal{F}^{\prime})\leq\varepsilon$ , for every $F\in\mathcal{F}$ a $F^{\prime}\in\mathcal{F}^{\prime}$ and an $\eta>0$ exist such that $D_{\mathrm{GENEO,H}}(F,F^{\prime})\leq\varepsilon+\eta$ . The definition of $D_{\mathrm{GENEO,H}}$ implies that $d_{H}(F(\varphi),F^{\prime}(\varphi))\leq\varepsilon+\eta$ for every $\varphi\in\varPhi$ . From Theorem 6.1 it follows that

[TABLE]

and

[TABLE]

for every $\varphi_{1},\varphi_{2}\in\varPhi$ .

Therefore,

[TABLE]

As a consequence, $\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}(\varphi_{1},\varphi_{2})\leq\mathcal{D}^{\mathcal{F}^{\prime},k}_{\mathrm{match}}(\varphi_{1},\varphi_{2})+2(\varepsilon+\eta)$ . We can show analogously that $\mathcal{D}^{\mathcal{F}^{\prime},k}_{\mathrm{match}}(\varphi_{1},\varphi_{2})\leq\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}(\varphi_{1},\varphi_{2})+2(\varepsilon+\eta)$ . Since $\eta$ can be chosen arbitrarily small, from the previous two inequalities the proof of our statement follows. ∎

Proposition (6.8).

Let $\mathcal{F}$ be a non-empty subset of $\mathcal{F}^{\mathrm{all}}$ . For every $\varepsilon>0$ , a finite subset $\mathcal{F}^{*}$ of $\mathcal{F}$ exists, such that

[TABLE]

for every $\varphi_{1},\varphi_{2}\in\varPhi$ .

Proof.

Let us consider the closure $\bar{\mathcal{F}}$ of $\mathcal{F}$ in $\mathcal{F}^{\mathrm{all}}$ . Let us also consider the covering $\mathcal{U}$ of $\bar{\mathcal{F}}$ obtained by taking all the open balls of radius $\frac{\varepsilon}{2}$ centered at points of $\mathcal{F}$ , with respect to $D_{\mathrm{GENEO}}$ . Theorem 5.1 guarantees that $\mathcal{F}^{\mathrm{all}}$ is compact, hence also $\bar{\mathcal{F}}$ is compact. Therefore we can extract a finite covering $\{B_{1},\dots,B_{m}\}$ of $\bar{\mathcal{F}}$ from $\mathcal{U}$ . We can set $\mathcal{F}^{*}$ equal to the set of centers of the balls $B_{1},\dots,B_{m}$ . The statement of our corollary immediately follows from Proposition 6.7, by recalling that $D_{\mathrm{GENEO,H}}\leq D_{\mathrm{GENEO}}$ and hence $HD\left(\bar{\mathcal{F}},\mathcal{F}^{*}\right)\leq\varepsilon/2$ . ∎

References

[1]

Y. LeCun, Y. Bengio, et al., Convolutional networks for images, speech, and time series, The handbook of brain theory and neural networks 3361 (10) (1995) 1995.

[2]

F. Anselmi, L. Rosasco, T. Poggio, On invariance and selectivity in representation learning, Information and Inference: A Journal of the IMA 5 (2) (2016) 134–158.

arXiv:/oup/backfile/content_public/journal/imaiai/5/2/10.1093_imaiai_iaw009/2/iaw009.pdf, doi:10.1093/imaiai/iaw009.

URL http://dx.doi.org/10.1093/imaiai/iaw009

[3]

P. Frosini, G. Jabłoński, Combining persistent homology and invariance groups for shape comparison, Discrete Comput. Geom. 55 (2) (2016) 373–409.

doi:10.1007/s00454-016-9761-y.

URL http://dx.doi.org/10.1007/s00454-016-9761-y

[4]

T. Cohen, M. Welling, Group equivariant convolutional networks, in: International conference on machine learning, 2016, pp. 2990–2999.

[5]

D. E. Worrall, S. J. Garbin, D. Turmukhambetov, G. J. Brostow, Harmonic networks: Deep translation and rotation equivariance, in: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Vol. 2, 2017.

[6]

H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman, S. Chepushtanova, E. Hanson, F. Motta, L. Ziegelmeier, Persistence images: A stable vector representation of persistent homology, J. Mach. Learn. Res. 18 (1) (2017) 218–252.

URL http://dl.acm.org/citation.cfm?id=3122009.3122017

[7]

C. S. Pun, K. Xia, S. Xian Lee, Persistent-homology-based machine learning and its applications – A survey, arXiv e-prints (2018) arXiv:1811.00252arXiv:1811.00252.

[8]

R. B. Gabrielsson, G. Carlsson, Exposition and interpretation of the topology of neural networks, CoRR abs/1810.03234.

arXiv:1810.03234.

URL http://arxiv.org/abs/1810.03234

[9]

P. Frosini, Towards an Observer-oriented Theory of Shape Comparison, in: A. Ferreira, A. Giachetti, D. Giorgi (Eds.), Eurographics Workshop on 3D Object Retrieval, The Eurographics Association, 2016.

doi:10.2312/3dor.20161080.

[10]

G. Carlsson, Topology and data, Bull. Amer. Math. Soc. (N.S.) 46 (2) (2009) 255–308.

doi:10.1090/S0273-0979-09-01249-X.

URL https://doi.org/10.1090/S0273-0979-09-01249-X

[11]

P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, G. E. Carlsson, Extracting insights from the shape of complex data using topology, in: Scientific reports, Vol. 3, 2013.

[12]

A. Hatcher, Algebraic topology, 清华大学出版社有限公司, 2005.

[13]

S. Biasotti, L. De Floriani, B. Falcidieno, P. Frosini, D. Giorgi, C. Landi, L. Papaleo, M. Spagnuolo, Describing Shapes by Geometrical-topological Properties of Real Functions, ACM Comput. Surv. 40 (4) (2008) 12:1–12:87.

doi:10.1145/1391729.1391731.

URL http://doi.acm.org/10.1145/1391729.1391731

[14]

G. Carlsson, A. Zomorodian, The theory of multidimensional persistence, Discrete Comput. Geom. 42 (1) (2009) 71–93.

doi:10.1007/s00454-009-9176-0.

URL http://dx.doi.org/10.1007/s00454-009-9176-0

[15]

H. Edelsbrunner, J. Harer, Persistent homology—a survey, in: Surveys on discrete and computational geometry, Vol. 453 of Contemp. Math., Amer. Math. Soc., Providence, RI, 2008, pp. 257–282.

doi:10.1090/conm/453/08802.

URL http://dx.doi.org/10.1090/conm/453/08802

[16]

D. Cohen-Steiner, H. Edelsbrunner, J. Harer, Stability of persistence diagrams, Discrete Comput. Geom. 37 (1) (2007) 103–120.

doi:10.1007/s00454-006-1276-5.

URL http://dx.doi.org/10.1007/s00454-006-1276-5

[17]

A. Cerri, B. Di Fabio, M. Ferri, P. Frosini, C. Landi, Betti numbers in multidimensional persistent homology are stable functions, Math. Methods Appl. Sci. 36 (12) (2013) 1543–1557.

doi:10.1002/mma.2704.

URL http://dx.doi.org/10.1002/mma.2704

[18]

S. A. Gaal, Point set topology, Pure and Applied Mathematics, Vol. XVI, Academic Press, New York-London, 1964.

[19]

S. Y. Oudot, Persistence theory: from quiver representations to data analysis, Vol. 209 of Mathematical Surveys and Monographs, American Mathematical Society, Providence, RI, 2015.

doi:10.1090/surv/209.

URL https://doi.org/10.1090/surv/209

[20]

A. Zomorodian, Fast construction of the Vietoris-Rips complex, Computers & Graphics 34 (3) (2010) 263–271.

[21]

R. Fabbri, L. D. F. Costa, J. C. Torelli, O. M. Bruno, 2D Euclidean distance transform algorithms: A comparative survey, ACM Computing Surveys (CSUR) 40 (1) (2008) 2.

[22]

P. Langfelder, B. Zhang, S. Horvath, Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R, Bioinformatics 24 (5) (2007) 719–720.

[23]

V. Nair, G. E. Hinton, Rectified linear units improve restricted boltzmann machines, in: Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807–814.

[24]

X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in: Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp. 249–256.

[25]

G. E. Hinton, A. Krizhevsky, S. D. Wang, Transforming auto-encoders, in: International Conference on Artificial Neural Networks, Springer, 2011, pp. 44–51.

[26]

S. Sabour, N. Frosst, G. E. Hinton, Dynamic Routing Between Capsules, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems 30, Curran Associates, Inc., 2017, pp. 3856–3866.

URL http://papers.nips.cc/paper/6975-dynamic-routing-between-capsules.pdf

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Y. Le Cun, Y. Bengio, et al., Convolutional networks for images, speech, and time series, The handbook of brain theory and neural networks 3361 (10) (1995) 1995.
2[2] F. Anselmi, L. Rosasco, T. Poggio, On invariance and selectivity in representation learning , Information and Inference: A Journal of the IMA 5 (2) (2016) 134–158. ar Xiv:/oup/backfile/content_public/journal/imaiai/5/2/10.1093_imaiai_iaw 009/2/iaw 009.pdf , doi:10.1093/imaiai/iaw 009 . URL http://dx.doi.org/10.1093/imaiai/iaw 009 · doi ↗
3[3] P. Frosini, G. Jabłoński, Combining persistent homology and invariance groups for shape comparison , Discrete Comput. Geom. 55 (2) (2016) 373–409. doi:10.1007/s 00454-016-9761-y . URL http://dx.doi.org/10.1007/s 00454-016-9761-y · doi ↗
4[4] T. Cohen, M. Welling, Group equivariant convolutional networks, in: International conference on machine learning, 2016, pp. 2990–2999.
5[5] D. E. Worrall, S. J. Garbin, D. Turmukhambetov, G. J. Brostow, Harmonic networks: Deep translation and rotation equivariance, in: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Vol. 2, 2017.
6[6] H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman, S. Chepushtanova, E. Hanson, F. Motta, L. Ziegelmeier, Persistence images: A stable vector representation of persistent homology , J. Mach. Learn. Res. 18 (1) (2017) 218–252. URL http://dl.acm.org/citation.cfm?id=3122009.3122017
7[7] C. S. Pun, K. Xia, S. Xian Lee, Persistent-homology-based machine learning and its applications – A survey, ar Xiv e-prints (2018) ar Xiv:1811.00252 ar Xiv:1811.00252 .
8[8] R. B. Gabrielsson, G. Carlsson, Exposition and interpretation of the topology of neural networks , Co RR abs/1810.03234. ar Xiv:1810.03234 . URL http://arxiv.org/abs/1810.03234

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Towards a topological-geometrical theory of group equivariant non-expansive operators for data analysis and machine learning

Abstract

keywords:

MSC:

1 Introduction

2 Epistemological setting

3 Mathematical background

3.1 Persistent Homology

Definition 3.1**.**

Definition 3.2**.**

4 Mathematical model

4.1 Data representation

Theorem 4.1**.**

Theorem 4.2**.**

4.1.1 A remark on the use of pseudo-metrics

4.2 Transformations on data

Proposition 4.3**.**

Remark 4.4*.*

Remark 4.5*.*

Remark 4.6*.*

Theorem 4.7**.**

Theorem 4.8**.**

4.2.1 The natural pseudo-distance dGd_{G}dG​

Definition 4.9**.**

4.2.2 A remark on the use of homeomorphisms

4.3 Group Equivariant Non-Expansive Operators

Remark 4.10*.*

Definition 4.11**.**

Example 4.12**.**

Proposition 4.13**.**

4.3.1 Pseudo-metrics on GENEO((Φ,G),(Ψ,H))\mathrm{GENEO}\left((\varPhi,G),(\varPsi,H)\right)GENEO((Φ,G),(Ψ,H))

Definition 4.14**.**

Proposition 4.15**.**

Proposition 4.16**.**

5 On the compactness and convexity of the space of GENEOs

5.1 The space of GENEOs is compact with respect to DGENEOD_{\mathrm{GENEO}}DGENEO​

Theorem 5.1**.**

5.2 The set of GENEOs is convex

Proposition 5.2**.**

Theorem 5.3**.**

5.3 GENEOs as agents in our model

6 A strongly group-invariant pseudo-metric induced by Persistent Homology

Theorem 6.1**.**

6.1 Strongly group invariant comparison of filtering functions via persistent homology

Remark 6.2*.*

Proposition 6.3**.**

6.2 Some theoretical results on the pseudo-metric DmatchF,k\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}DmatchF,k​

Remark 6.4*.*

Theorem 6.5**.**

Theorem 6.6**.**

6.3 Pseudo-metrics induced by persistent homology

6.4 Approximating DmatchF,k{\mathcal{D}}^{\mathcal{F},k}_{\mathrm{match}}DmatchF,k​

Proposition 6.7**.**

Proposition 6.8**.**

Remark 6.9*.*

6.5 Beyond group equivariance

7 Validation on discrete function spaces

7.1 Operators selection and sampling on labelled datasets

7.2 Isometry equivariant non-expansive operators

7.3 Applications

7.3.1 Image preprocessing

7.3.2 Metric learning through selection and sampling

7.3.3 Validation on augmented samples

7.3.4 Knowledge injection

8 Discussion and conclusions

Appendix A Additional propositions

Proposition A.1**.**

Remark A.2*.*

Proof.

Proposition A.3**.**

Proof.

Proposition A.4**.**

Proof.

Definition 3.1.

Definition 3.2.

Theorem 4.1.

Theorem 4.2.

Proposition 4.3.

*Remark 4.4**.*

*Remark 4.5**.*

*Remark 4.6**.*

Theorem 4.7.

Theorem 4.8.

4.2.1 The natural pseudo-distance $d_{G}$

Definition 4.9.

*Remark 4.10**.*

Definition 4.11.

Example 4.12.

Proposition 4.13.

4.3.1 Pseudo-metrics on $\mathrm{GENEO}\left((\varPhi,G),(\varPsi,H)\right)$

Definition 4.14.

Proposition 4.15.

Proposition 4.16.

5.1 The space of GENEOs is compact with respect to $D_{\mathrm{GENEO}}$

Theorem 5.1.

Proposition 5.2.

Theorem 5.3.

Theorem 6.1.

*Remark 6.2**.*

Proposition 6.3.

6.2 Some theoretical results on the pseudo-metric $\mathcal{D}^{\mathcal{F},k}_{\mathrm{match}}$

*Remark 6.4**.*

Theorem 6.5.

Theorem 6.6.

6.4 Approximating ${\mathcal{D}}^{\mathcal{F},k}_{\mathrm{match}}$

Proposition 6.7.

Proposition 6.8.

*Remark 6.9**.*

Proposition A.1.

*Remark A.2**.*

Proposition A.3.

Proposition A.4.

Definition B.1.

*Remark B.2**.*

Definition B.3.

Proposition B.4.

Theorem (4.1).

*Remark**.*

*Remark**.*

Theorem (4.2).

Example.

Proposition (4.3).

Theorem (4.7).

Theorem (4.8).

Example.

Proposition (4.13).

Proposition (4.16).

Theorem (5.1).

Proposition (5.2).

Theorem (5.3).

Proposition.

Theorem (6.5).

Theorem (6.6).

Proposition (6.7).

Proposition (6.8).