Invariant Layers for Graphs with Nodes of Different Types

Dmitry Rybin; Ruoyu Sun; Zhi-Quan Luo

arXiv:2302.13551·cs.LG·February 28, 2023

Invariant Layers for Graphs with Nodes of Different Types

Dmitry Rybin, Ruoyu Sun, Zhi-Quan Luo

PDF

Open Access

TL;DR

This paper characterizes linear layers invariant to permutations that preserve node types in heterogeneous graphs, enabling more effective learning of node interactions and providing tighter bounds on tensor sizes for function approximation.

Contribution

It fully characterizes invariant linear layers for node-type-preserving permutations and extends Bell number generalizations, improving graph neural network design and tensor size bounds.

Findings

01

Invariant layers improve learning of node interactions.

02

Tensor size bounds are tightened from n(n-1)/2 to n.

03

For image data, tensor generator size is bounded by 2d - 1.

Abstract

Neural networks that satisfy invariance with respect to input permutations have been widely studied in machine learning literature. However, in many applications, only a subset of all input permutations is of interest. For heterogeneous graph data, one can focus on permutations that preserve node types. We fully characterize linear layers invariant to such permutations. We verify experimentally that implementing these layers in graph neural network architectures allows learning important node interactions more effectively than existing techniques. We show that the dimension of space of these layers is given by a generalization of Bell numbers, extending the work (Maron et al., 2019). We further narrow the invariant network design space by addressing a question about the sizes of tensor layers necessary for function approximation on graph data. Our findings suggest that function…

Tables6

Table 1. Table 1: The dimension of space of invariant linear layers in graphs with m 𝑚 m node types. Setting m = 1 𝑚 1 m=1 recovers results from (Maron et al., 2019b ) .

Tensor sizes	1	2	3
(Maron et al., 2019b)	1	2	5
This work	$m$	$m^{2} + m$	$m^{3} + 3 m^{2} + m$

Table 2. Table 2: Comparison of theoretical contribution to the past work.

Ref.	(Maron et al., 2019b)	Theorem 3.1
	Classification of	Classification of
Results	$S_{n}$ -invariant	$S_{n_{1}} \times \dots \times S_{n_{m}}$ -
	linear layers	-invariant linear layers
	$ℝ^{n^{k}} \to ℝ$ .	$ℝ^{n^{k}} \to ℝ$ .

Table 3. Table 3: Implications of the proposed conjectures and theorems.

Result	Description
	A decrease of tensor sizes in CNN
Theorem 4.3	for translation-invariant function
	approximation from $d^{4}$ to $2 d - 1$
	A decrease of required tensor sizes
Conjecture A	from $n (n - 1) / 2$ to $n$ for function
	approximation on graphs with $n$ nodes.
Conjecture B	Graph instance-dependent bound
	on required tensor sizes.

Table 4. Table 4: The obtained sequence of dimensions of invariant subspace for graphs with m 𝑚 m node types can be viewed as a generalization of Bell numbers.

Space	Dimension of Invariant Subspace
1-tensors	$m$
2-tensors	$m^{2} + m$
3-tensors	$m^{3} + 3 m^{2} + m$
4-tensors	$m^{4} + 6 m^{3} + 7 m^{2} + m$

Table 5. Table 5: Mean and standard deviation of Micro-F1 score over 10 runs. Baselines are taken from (Wang & Zhang, 2022a ) .

Task	ppi-bp	hpo-metab
GLASS	$0.619 \pm 0.007$	$0.614 \pm 0.005$
SubGNN	$0.599 \pm 0.008$	$0.537 \pm 0.008$
GNN-Seg	$0.361 \pm 0.008$	$0.542 \pm 0.009$
This work	$0.625 \pm 0.017$	$0.611 \pm 0.024$

Table 6. Table 6: Mean and standard deviation of Micro-F1 score over 10 runs. Baselines are taken from (Wang & Zhang, 2022a ) .

Task	density	cut ratio
GLASS	$0.930 \pm 0.009$	$0.935 \pm 0.006$
SubGNN	$0.919 \pm 0.006$	$0.629 \pm 0.013$
GNN-Seg	$0.952 \pm 0.006$	$0.346 \pm 0.011$
This work	$0.949 \pm 0.008$	$0.947 \pm 0.008$

Equations100

f (x_{1}, x_{2}, ..., x_{n}) = f (x_{P (1)}, x_{P (2)}, ..., x_{P (n)}) .

f (x_{1}, x_{2}, ..., x_{n}) = f (x_{P (1)}, x_{P (2)}, ..., x_{P (n)}) .

a_{1} x_{1} + a_{2} x_{2} + ... + a_{n} x_{n} = a_{1} x_{P (1)} + a_{2} x_{P (2)} + ... + a_{n} x_{P (n)} .

a_{1} x_{1} + a_{2} x_{2} + ... + a_{n} x_{n} = a_{1} x_{P (1)} + a_{2} x_{P (2)} + ... + a_{n} x_{P (n)} .

\sum a_{i_{1}, ..., i_{k}} x_{i_{1}, ..., i_{k}} = \sum a_{i_{1}, ..., i_{k}} x_{P (i_{1}), ..., P (i_{k})} .

\sum a_{i_{1}, ..., i_{k}} x_{i_{1}, ..., i_{k}} = \sum a_{i_{1}, ..., i_{k}} x_{P (i_{1}), ..., P (i_{k})} .

f (P x) = P f (x) .

f (P x) = P f (x) .

F : R^{n} \to R

F : R^{n} \to R

F (x) = M \circ h \circ L_{d} \circ σ \circ \dots \circ σ \circ L_{1},

F (x) = M \circ h \circ L_{d} \circ σ \circ \dots \circ σ \circ L_{1},

L (x) = i \in K_{1} \sum x_{i, i},

L (x) = i \in K_{1} \sum x_{i, i},

L (x) = i \in K_{2} \sum x_{i, i},

L (x) = i \in K_{2} \sum x_{i, i},

L (x) = i \neq = j \in K_{1} \sum x_{i, j},

L (x) = i \neq = j \in K_{1} \sum x_{i, j},

L (x) = i \neq = j \in K_{2} \sum x_{i, j},

L (x) = i \neq = j \in K_{2} \sum x_{i, j},

L (x) = i \in K_{1}, j \in K_{2} \sum x_{i, j},

L (x) = i \in K_{1}, j \in K_{2} \sum x_{i, j},

L (x) = i \in K_{2}, j \in K_{1} \sum x_{i, j} .

L (x) = i \in K_{2}, j \in K_{1} \sum x_{i, j} .

f : R^{n} \to R,

f : R^{n} \to R,

f (x) = f (σ (x)), \forall σ \in Aut G .

f (x) = f (σ (x)), \forall σ \in Aut G .

Aut G = {σ \in S_{n} ∣ (i, j) \in E ⟺ (σ (i), σ (j)) \in E} .

Aut G = {σ \in S_{n} ∣ (i, j) \in E ⟺ (σ (i), σ (j)) \in E} .

x \in K max ∣ F (x) - f (x) ∣ < ϵ .

x \in K max ∣ F (x) - f (x) ∣ < ϵ .

f (x) = \sum a_{i_{1}, ..., i_{k}} x_{i_{1}, ..., i_{k}} .

f (x) = \sum a_{i_{1}, ..., i_{k}} x_{i_{1}, ..., i_{k}} .

\sum a_{i_{1}, ..., i_{k}} x_{i_{1}, ..., i_{k}} = \sum a_{i_{1}, ..., i_{k}} x_{P (i_{1}), ..., P (i_{k})},

\sum a_{i_{1}, ..., i_{k}} x_{i_{1}, ..., i_{k}} = \sum a_{i_{1}, ..., i_{k}} x_{P (i_{1}), ..., P (i_{k})},

\forall P \in G .

\forall P \in G .

e^{m (e^{x} - 1)} =

e^{m (e^{x} - 1)} =

1 + m \frac{x}{1 !} + (m^{2} + m) \frac{x ^{2}}{2 !} + (m^{3} + 3 m^{2} + m) \frac{x ^{3}}{3 !} + ...

1 + m \frac{x}{1 !} + (m^{2} + m) \frac{x ^{2}}{2 !} + (m^{3} + 3 m^{2} + m) \frac{x ^{3}}{3 !} + ...

j = 1 ⨆ m T_{j} = {1, ..., k},

j = 1 ⨆ m T_{j} = {1, ..., k},

B_{γ_{1}, ..., γ_{m}}

B_{γ_{1}, ..., γ_{m}}

z_{a, b} = i = 1 \sum d j = 1 \sum d x_{i, j} (d - 1)^{a (i - 1) + b (j - 1)} .

z_{a, b} = i = 1 \sum d j = 1 \sum d x_{i, j} (d - 1)^{a (i - 1) + b (j - 1)} .

a, b = 1 \prod d z_{a, b}^{α_{a, b}}

a, b = 1 \prod d z_{a, b}^{α_{a, b}}

a, b = 1 \sum d a \cdot α_{a, b} = a, b = 1 \sum d b \cdot α_{a, b} = 0.

a, b = 1 \sum d a \cdot α_{a, b} = a, b = 1 \sum d b \cdot α_{a, b} = 0.

R^{n} = U_{1} \oplus U_{2} \oplus ... \oplus U_{m} .

R^{n} = U_{1} \oplus U_{2} \oplus ... \oplus U_{m} .

(U_{1} \oplus U_{2} \oplus ... \oplus U_{m})^{\otimes k}

(U_{1} \oplus U_{2} \oplus ... \oplus U_{m})^{\otimes k}

k_{1}, ..., k_{m} ⨁ (k _{1} , ... , k _{m} k) U_{1}^{\otimes k_{1}} \otimes ... \otimes U_{m}^{\otimes k_{m}} .

k_{1}, ..., k_{m} ⨁ (k _{1} , ... , k _{m} k) U_{1}^{\otimes k_{1}} \otimes ... \otimes U_{m}^{\otimes k_{m}} .

(U_{1} \oplus U_{2})^{\otimes 2} = (U_{1} \otimes U_{1}) \oplus (U_{1} \otimes U_{2}) \oplus (U_{2} \otimes U_{1}) \oplus (U_{2} \otimes U_{2}) .

(U_{1} \oplus U_{2})^{\otimes 2} = (U_{1} \otimes U_{1}) \oplus (U_{1} \otimes U_{2}) \oplus (U_{2} \otimes U_{1}) \oplus (U_{2} \otimes U_{2}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Physics and Python Applications · Advanced Graph Neural Networks · Machine Learning in Healthcare

MethodsGraph Neural Network

Full text

Invariant Layers for Graphs with Nodes of Different Types

Dmitry Rybin

Ruoyu Sun

Zhi-Quan Luo

Abstract

Neural networks that satisfy invariance with respect to input permutations have been widely studied in machine learning literature. However, in many applications, only a subset of all input permutations is of interest. For heterogeneous graph data, one can focus on permutations that preserve node types. We fully characterize linear layers invariant to such permutations. We verify experimentally that implementing these layers in graph neural network architectures allows learning important node interactions more effectively than existing techniques. We show that the dimension of space of these layers is given by a generalization of Bell numbers, extending the work (Maron et al., 2019b). We further narrow the invariant network design space by addressing a question about the sizes of tensor layers necessary for function approximation on graph data. Our findings suggest that function approximation on a graph with $n$ nodes can be done with tensors of sizes $\leq n$ , which is tighter than the best-known bound $\leq n(n-1)/2$ . For $d\times d$ image data with translation symmetry, our methods give a tight upper bound $2d-1$ (instead of $d^{4}$ ) on sizes of invariant tensor generators via a surprising connection to Davenport constants.

graph neural network, expressive power, invariant, tensor, permutation

1 Introduction

The study of invariant and equivariant neural networks has been gaining popularity in recent years. Many fundamental properties, such as universal approximation theorems (Maron et al., 2019c), (Yarotsky, 2021; Ravanbakhsh, 2020), have been proved. The design of expressive invariant layers remains an important direction in Deep Learning (Hartford et al., 2018; Kondor & Trivedi, 2018).

Permutation invariant networks are an important special case. In these networks, the symmetry group of the input data is given by all possible permutations of input coordinates. In particular, such symmetry appears in the use of graph neural networks (Kipf & Welling, 2017), where the invariance comes from the permutation of nodes. This symmetry is crucial for architecture design in graph neural networks and the study of their expressive power and universal approximation properties (Chen et al., 2019; Frasca et al., 2022; Garg et al., 2020; Huang et al., 2022; Xu et al., 2019; Bevilacqua et al., 2022; Qian et al., 2022). However, node permutations in homogeneous and heterogeneous graphs have certain differences that received little attention in the literature.

In applications with heterogeneous graphs, the problem background often requires certain groups of nodes to have significantly different properties or represent objects of different nature. Examples of graph applications with many node types can be found in recommendation systems (Wu et al., 2020), chemistry (Reiser et al., 2022), and Learn-to-Optimize (Gasse et al., 2019). Many Graph Neural Network architectures attempt to capture the relations between nodes of different types. Despite many theoretical guarantees (Wang & Zhang, 2022a), some simple features are still hard to learn in practice with existing layers, see Section 5 for experimental evidence. In this paper, we aim to fix this gap by characterizing all invariant linear layers for permutations preserving node types, hence extending the work (Maron et al., 2019b) to heterogeneous graphs.

In Section 3, Theorem 3.1, we provide a complete characterization of linear layers $\mathbb{R}^{n^{k}}\to\mathbb{R}$ with $k$ -tensor input, invariant to permutations from $S_{n_{1}}\times S_{n_{2}}\times...\times S_{n_{m}}$ , where $n_{1}+...+n_{m}=n$ is the number of nodes and $m$ is the number of different types of nodes. The dimension of the complete space of these layers is given by a generalization of Bell numbers, as can be seen in Table 2. The fact that the structure of invariant layers depends only on the number of node types $m$ and tensor size $k$ , and does not depend on $n$ , is crucial for the re-use of these layers in graph neural networks. It follows that these layers can be directly applied to any input graph with $m$ types of nodes, independent of the number of nodes. We provide an explicit orthogonal basis and implementation description for these layers (Theorem 3.2).

A complete characterization of invariant/equivariant tensor layers defines a design space for invariant neural networks. It was discovered (Maron et al., 2019c; Ravanbakhsh, 2020; Keriven & Peyré, 2019; Maron et al., 2019a) that higher-order tensors are necessary for function approximation with invariant/equivariant neural networks. However, explicit bounds on required tensor sizes are needed.

The work (Maron et al., 2019c) showed that $n(n-1)/2$ -tensors are sufficient for function approximation on graph data with $n$ nodes. Lowering this bound is an open question posed in (Maron et al., 2019c) and (Keriven & Peyré, 2019). In Section 4, we provide several theorems and conjectures suggesting a bound below $n$ . Furthermore, we show how the structure of orbits of the graph automorphism group determines how tensor sizes can be lowered. Our findings are formulated as conjectures A and B. We summarize the implications of each conjecture in Table 3. The mathematical formulation of the conjectures is provided in Section 4. Some applications, such as image graphs in computer vision, provide special cases that can be analyzed fully. We prove that translation-invariant function approximation on $d\times d$ images can be done with tensors of size $\leq 2d-1$ (Theorem 4.3). The proof makes a surprising connection between translation-invariant tensor layers and Davenport constants (Olson, 1969).

Finally, we discuss the differences between our work and prior results. Due to practical importance, treating different types of nodes has been approached in many applications. For bipartite graphs, half-GNN (Gasse et al., 2019) and EvenNet were proposed (Lei et al., 2022). Layers in half-GNN can be viewed as a special case of ours. For extracting properties of a subset of vertices, subgraph structure extraction (Sub-GNN) (Alsentzer et al., 2020), labeling tricks (GLASS) (Wang & Zhang, 2022b), and other approaches (Sun et al., 2021; You et al., 2021; Huang & Zitnik, 2020) were used. Subgraph data pooling in Sub-GNN and GLASS is an example of a layer invariant only to permutations of nodes within a subgraph and hence is a special case of our layers. Tensor sizes for function approximation with convolutional networks were analyzed in (Yarotsky, 2021). As pointed out in (Yarotsky, 2021), finding a small explicit set of generators of translation-invariant tensors is not trivial. Therefore the work (Yarotsky, 2021) took an alternative approach of averaging the outputs over the whole symmetry group. We bypassed the difficulty by making a change of basis and noting a connection with zero-sum sequences in groups (proof of Theorem 4.3).

2 Preliminaries

A function $f:\mathbb{R}^{n}\to\mathbb{R}$ is invariant to a permutation $P$ if for any input $x\in\mathbb{R}^{n}$ we have

[TABLE]

For linear functions $f(x)=a_{1}x_{1}+...+a_{n}x_{n}$ this condition is equivalent to a fixed point equations (Maron et al., 2019b), which can be solved by analyzing orbits of indices,

[TABLE]

For tensor input data, $x\in\mathbb{R}^{n^{k}}$ , coordinates are indexed by $k$ -tuples $x_{i_{1},...,i_{k}}$ , where $i_{1},...,i_{k}\in\{1,2,...,n\}$ . Permutation $P$ now acts on all elements in a tuple, simultaneously permuting indices in all $k$ axes of a $k$ -tensor. For a linear map $f:\mathbb{R}^{n^{k}}\to\mathbb{R}$ , invariance to permutation $P$ is equivalent to a fixed point equation

[TABLE]

While for a linear map between $k$ -tensors and $d$ -tensors $f:\mathbb{R}^{n^{k}}\to\mathbb{R}^{n^{d}}$ , equivariance to permutation $P$ is stated as

[TABLE]

In the fundamental work (Maron et al., 2019b), linear maps $\mathbb{R}^{n^{k}}\to\mathbb{R}$ invariant to all possible $n!$ permutations $P$ were explicitly classified. The dimension of space of these maps was shown to be given by Bell numbers $B(k)$ . This classification is important since it provides a complete design space for invariant and equivariant neural networks.

Recall that a standard model for the invariant neural network is a function

[TABLE]

defined as

[TABLE]

where $L_{i}:\mathbb{R}^{n^{k_{i}}\times a_{i}}\to\mathbb{R}^{n^{k_{i+1}}\times a_{i+1}}$ are linear equivariant layers (quantity $a_{i}$ is the number of filters or channels in layer $i$ ), $\sigma$ is an activation function (such as ReLU or sigmoid), $h:\mathbb{R}^{n^{k_{d+1}}\times a_{d+1}}\to\mathbb{R}^{m}$ is an invariant layer, and $M$ is a multi-layer perceptron. Here layers $L_{i}$ are equivariant to a predefined set of permutations $P$ . While layer $h$ is invariant to that pre-defined set of permutations. For our applications we consider all permutations from $S_{n_{1}}\times S_{n_{2}}\times...\times S_{n_{m}}$ , as explained below.

Consider a graph with $n$ nodes of different types: $n_{1}$ nodes of type $1$ , $n_{2}$ nodes of type $2$ , …., $n_{m}$ nodes of type $m$ , $n_{1}+...+n_{m}=n$ (see Figures 1 and 2). For general graph-level tasks aggregation is performed over all $n$ nodes equivalently. Such aggregation operation guarantees invariance of the output to all $n!$ permutations from $S_{n}$ . However, to capture properties shared by one type of node, it is viable to use aggregations only within nodes of the same type. Such aggregations would only preserve symmetries from a subgroup that permutes nodes of the same type, $S_{n_{1}}\times S_{n_{2}}\times...\times S_{n_{m}}\subset S_{n}$ .

For a special case of two node types, orthogonal bases of the new invariant linear layers are illustrated in Figures 3 and 4. A map $\mathbb{R}^{n^{k}}\to\mathbb{R}^{n^{d}}$ is represented by a $(k+d)$ -tensor. In particular, a map $\mathbb{R}^{n}\to\mathbb{R}^{n}$ in Figure 3 is given by a matrix. Assume that $K_{i}$ is a set of nodes of type $i$ for $i=1,2$ . Then the six maps in the Figure 3 $L:\mathbb{R}^{n^{2}}\to\mathbb{R}$ are

[TABLE]

Note that relations such as “number of edges between nodes of type $1$ and $2$ ” are easier to capture with this set of maps.

Graph data can be encoded using tensors. Node features are 1-tensors, while edge features are 2-tensors. Higher-order tensors correspond to hypergraph data such as hyper-edges (Maron et al., 2019b), or non-trivial structures such as bags of rooted sub-graphs (Frasca et al., 2022). Symmetries preserved by those tensors are related to the symmetries of the underlying graph.

Let $G$ be a graph with $n$ vertices, and $x$ be a feature vector $x=(x_{1},...,x_{n})\in\mathbb{R}^{n}$ . Where $x_{i}$ is a scalar feature of node $i$ . If $f$ is a graph function,

[TABLE]

then by definition, $f$ must respect graph symmetries, i.e.

[TABLE]

Here $\mathrm{Aut}\;G$ is a symmetry (automorphism) group of $G$ . It is defined as a subgroup of all node permutations $S_{n}$ that preserve the graph structure

[TABLE]

For example, for the graph in Figure 5, any graph function $f(x_{1},x_{2},x_{3},x_{4})$ must be invariant to all permutations of variables $x_{1},x_{2},x_{3}$ . The reason is that these permutations produce the same graph structure, and hence any function $f$ that depends only on graph structure must give the same output after such permutation.

For a general subgroup of $S_{n}$ , the function approximation properties of tensor layers are discussed in (Maron et al., 2019c). However, the preservation of graph structure puts a significant restriction on the subgroup $\mathrm{Aut}\;G\subset S_{n}$ . This restriction can lead to a bound that is significantly smaller than $n(n-1)/2$ .

For a given graph $G$ with $n$ nodes define an integer $T(G)$ as the smallest size of tensors that allows invariant neural network to approximate any $\mathrm{Aut}\;G$ -invariant function $f:\mathbb{R}^{n}\to\mathbb{R}$ . That is, for any continuous $\mathrm{Aut}\;G$ -invariant function $f$ , any compact set $K\subset\mathbb{R}^{n}$ and any $\epsilon>0$ there should exist an invariant neural network $F$ with tensor sizes $\leq T(G)$ such that

[TABLE]

Open Question: Can the bound $T(G)\leq n(n-1)/2$ be improved?

We note that the work (Maron et al., 2019c) connected the quantity $T(G)$ to degrees of polynomial generators of ring of invariants $\mathbb{R}[x_{1},...,x_{n}]^{\mathrm{Aut}\;G}$ .

3 Classification of Invariant Layers

Recall that a linear map $f:\mathbb{R}^{n^{k}}\to\mathbb{R}$ can be written as

[TABLE]

The condition that a map $f$ is invariant to a subgroup $G\subset S_{n}$ of permutations is equivalent to a set of fixed points equations

[TABLE]

Solving these equations can be reduced to a certain technical calculation from the branch of invariant theory and representation theory (Fulton & Harris, 2004). We provide this calculation in appendix A for permutations preserving node types (Theorem 3.1), cyclic shifts (Theorem 3.3), and translations (Theorem 3.4).

Theorem 3.1.

If a graph contains nodes of $m$ different types, then the dimension of space of invariant layers $\mathbb{R}^{n^{k}}\to\mathbb{R}$ is given by the coefficient in front of $x^{k}/k!$ in the expression

[TABLE]

The dimension of space of equivariant layers $\mathbb{R}^{n^{k}}\to\mathbb{R}^{n^{d}}$ is given by the coefficient in front of $x^{k+d}/(k+d)!$ .

We provide an explicit basis for classified invariant layers.

Theorem 3.2.

An orthogonal basis in space of $S_{n_{1}}\times...\times S_{n_{m}}$ -invariant tensor layers $\mathbb{R}^{n^{k}}\to\mathbb{R}$ is given by the following set. For every disjoint partition

[TABLE]

and for every tuple $(e_{\gamma_{1}},...,e_{\gamma_{m}})$ , where each $B_{\gamma_{j}}$ is a basis vector in space of $S_{|T_{j}|}$ -invariant layers $\mathbb{R}^{n^{|T_{j}|}}\to\mathbb{R}$ , as in Theorem 1 from (Maron et al., 2019b), form a vector

[TABLE]

by setting the coefficient in front of $e_{i_{1},i_{2},...,i_{k}}$ to $1$ if and only if the coefficient in front of $e_{(i_{s},s\in T_{j})}$ in $B_{\gamma_{j}}$ is $1$ for all $j$ . Equivalently, $B_{\gamma_{1},...,\gamma_{m}}$ is a tensor product of $B_{\gamma_{j}}$ when put at appropriate indices.

To illustrate the generality of the methods we develop, we also provide the results for cyclic permutations and translations. The proofs of the following theorems can be found in Appendix A.

Theorem 3.3.

The dimension of space of cyclically invariant maps $\mathbb{R}^{n^{k}}\to\mathbb{R}$ is equal to $n^{k-1}$ .

Theorem 3.4.

The dimension of space of translation-invariant maps $\mathbb{R}^{(n^{2})^{k}}\to\mathbb{R}$ is equal to $n^{2k-2}$ . Here $\mathbb{R}^{n^{2}}$ is the space of $n\times n$ images with the action of translation group $C_{n}\times C_{n}$ .

4 Tensor Sizes for Function Approximation on Graphs

In the following section, we discuss an open question about function approximation on graphs. The question was raised in (Maron et al., 2019c) and (Keriven & Peyré, 2019).

Given a subgroup $H\subset S_{n}$ , there is a construction of an $H$ -invariant neural network that uses tensors of size up to $n(n-1)/2$ and achieves universal approximation property for $H$ -invariant functions, see Theorem 3 in (Maron et al., 2019c). When implementing a layer with tensors of size $n(n-1)/2$ , the number of neurons reaches $n^{n(n-1)/2}$ . Such scale is impractical, and further optimization of tensor sizes is needed.

Recall that an integer $T(G)$ is defined as the smallest bound on tensor sizes that allow continuous function approximation on graphs using Invariant Neural Network. We propose the following conjectures.

*Conjecture 4.1** (A).*

For a graph $G$ with $n$ nodes, we have $T(G)\leq n$ .

*Conjecture 4.2** (B).*

The value $T(G)$ is upper bounded by the maximal size of $\mathrm{Aut}\;G$ orbit.

We experimentally verified that these conjectures hold for all graphs with $n\leq 7$ .

We illustrate the relevance of these conjectures with a well-known application - universal approximation of translation-invariant functions with convolutional neural networks (Zhou, 2018). Image classes are often assumed to be translation-invariant functions. If the input data is given by $d\times d$ images, then the translation group is a product of cyclic groups $C_{d}\times C_{d}$ (horizontal and vertical shifts). Note that we ignore rotations and reflections for simplicity.

Theorem 4.3.

Invariant neural networks with tensor layers of size up to $2d-1$ can approximate any translation-invariant continuous function on $d\times d$ image data.

We provide the proof of the Theorem 4.3 below. The first key step in the proof is a change of basis that diagonalizes the action of the commutative group $C_{d}\times C_{d}$ . The second key step is the connection of invariance to the notion of zero-sum sequences and the Davenport constant of a group, an idea that seems new in the machine learning literature.

Proof of Theorem 4.3.

The group of translations $C_{d}\times C_{d}$ acting on images can be viewed as a subgroup of $S_{d^{2}}$ . By Theorem 1 from (Maron et al., 2019c), an invariant neural network can approximate any function invariant to $C_{d}\times C_{d}$ . The same work shows an upper bound $d^{2}(d^{2}-1)/2$ on tensor sizes required for $C_{d}\times C_{d}$ -invariant function approximation. Let us show that only tensors of size up to $2d-1$ are required. From the proof of Theorem 1 in (Maron et al., 2019c) we know that it suffices to approximate generators in the ring of $C_{d}\times C_{d}$ invariant polynomials. The ring in question has $d^{2}$ variables $\mathbb{R}[x_{1,1},x_{1,2},...,x_{d,d-1},x_{d,d}]$ , and the action of $C_{d}\times C_{d}$ performs cyclic shifts on first and second indices of variables. The invariants of this action are not easy to analyze in basis $x_{i,j}$ . We propose a change of basis that diagonalizes this action.

Define a new basis $z_{a,b}$ as follows

[TABLE]

Then the action of the translation $(p,q)\in C_{d}\times C_{d}$ on $z_{a,b}$ is multiplication by $\sqrt[d]{-1}^{pa+qb}$ , i.e. multiplication by a root of unity.

If a polynomial in variables $z_{a,b}$ is invariant to the action of $C_{d}\times C_{d}$ , then every its monomial

[TABLE]

is also invariant, i.e.

[TABLE]

This relation is a zero-sum relation on a sequence of length $\sum\alpha_{a,b}$ that contains $\alpha_{a,b}$ elements $(a,b)$ from the group $C_{d}\times C_{d}$ . We conclude that finding invariant tensor layers for Convolutional Neural Networks is equivalent to finding zero-sum sequences in the group $C_{d}\times C_{d}$ .

Davenport constant of a group $G$ is defined as the maximal length of a sequence of elements from $G$ that contain no zero-sum subsequence. Davenport constant of the group $C_{d}\times C_{d}$ was computed before (Olson, 1969) and is equal to $2d-1$ . It follows that any sequence of length $>2d-1$ contains a non-empty zero-sum subsequence. In terms of invariant monomials, it means that any invariant monomial of degree $2d$ and above can be written as a product of invariant monomials of degrees $\leq 2d-1$ . Hence all generators of the ring of $C_{d}\times C_{d}$ -invariants have degrees $\leq 2d-1$ . By a connection established in the works (Yarotsky, 2021), (Maron et al., 2019c), we conclude that $2d-1$ is an upper bound on the required tensor size in Convolutional Neural Networks. ∎

5 Experiments

To support the claim that our invariant/equivariant layer design improves learning on graphs with different node types, we consider open benchmarks with tasks that require learning interactions between groups of nodes, such as subgraph tasks. We compare our models to three recent architectures that achieved state-of-the-art or close results: SubGNN (Alsentzer et al., 2020), GLASS (Wang & Zhang, 2022a), GNN-Seg (treating a single group of nodes while ignoring the rest of the graph).

The training process in all experiments uses Adam optimizer (Kingma & Ba, 2014) and ReduceLROnPlateau learning rate scheduler. The number of iterations in training is bounded by 10000, and early stopping is performed based on a non-increase of the validation data score for 1000 iterations. The models were implemented with Pytorch (Fey & Lenssen, 2019).

5.1 Real datasets

We evaluate the model performance on four real-world datasets with two node types: ppi-bp, em-user, hpo-metab, hpo-neuro, with 80:10:10 training, validation, and test split.

We use the node labeling trick with Message Passing Neural Network architecture similar to GLASS. We add two more layers: $S_{n_{1}}\times S_{n_{2}}$ -equivariant layer $\mathbb{R}^{n}\to\mathbb{R}^{n}$ and $S_{n_{1}}\times S_{n_{2}}$ -invariant $\mathbb{R}^{n}\to\mathbb{R}$ layer instead of sum or average pooling.

The proposed model achieves close to state-of-the-art results on 3 out of 4 datasets. We note that the model variance is noticeably higher. One possible explanation is the high sensitivity caused by added learnable mappings.

5.2 Synthetic datasets

We use four synthetic datasets introduced in (Alsentzer et al., 2020): density, cut ratio, coreness, and component. We follow the 50:25:25 training, validation, and test split as in (Alsentzer et al., 2020). Our model for synthetic data follows an invariant neural network architecture. In particular, we use three equivariant $\mathbb{R}^{n}\to\mathbb{R}^{n}$ layers, followed by an invariant $\mathbb{R}^{n}\to\mathbb{R}$ pool layer and Muli-Layer-Perceptron. The vector-form implementation of the used layers is given in Appendix C.

We compare the performance of the state-of-the-art models and our model on these tasks, see table 6. The proposed model achieves state-of-the-art or similar results on 4 out of 4 synthetic datasets.

6 Conclusion

In this work, we presented a complete classification of linear tensor layers invariant to permutations of nodes of the same type. We experimentally verified the performance improvement these layers show on real and synthetic tasks. New steps have been made to further bound the size of tensors required for function approximation on graph data. In particular, when treating image data as graph data, we obtained tight bounds on the sizes of invariant convolutional tensor layers.

7 Acknowledgements

The work of Z.-Q. Luo was supported in part by the National Key Research and Development Project under grant 2022YFA1003900, and in part by the Guangdong Provincial Key Laboratory of Big Data Computing.

Appendix A Proofs of Main Theorems

Proof of Theorem 3.1.

Decompose the space $\mathbb{R}^{n}$ into a direct sum of subspaces where permutations $S_{n_{1}}\times...\times S_{n_{m}}$ act,

[TABLE]

Rewrite the tensor product

[TABLE]

into multinomial sum, see (Fulton & Harris, 2004),

[TABLE]

For example,

[TABLE]

Using the result of (Maron et al., 2019b), we note that the dimension of $S_{n_{j}}$ invariants of $U_{j}^{\otimes k_{j}}$ is equal to $B(k_{j})$ . Hence the dimension of $S_{n_{1}}\times...\times S_{n_{m}}$ invariants is given by the sum

[TABLE]

The expression above is known in the theory of exponential generating functions (Stanley, 2011), see Lemma B.1. we conclude that this sum appears as a coefficient in front of $x^{k}/k!$ in the series

[TABLE]

For the claim about equivariant maps $\mathbb{R}^{n^{k}}\to\mathbb{R}^{n^{d}}$ see Lemma B.2. ∎

Proof of Theorem 3.2.

Consider the set of vectors in $\mathbb{R}^{n^{k}}$ obtained by the following procedure

For each node type $j$ select a subset of $T_{j}$ of indices from $\{1,...,k\}$ , in such a way that

[TABLE] 2. 2.

For each subspace $\mathbb{R}^{n^{|T_{j}|}}$ we select a basis element $B_{\gamma_{j}}$ , where $\gamma_{j}$ is a partition of $T_{j}$ , according to a construction of the basis in (Maron et al., 2019b). The basis element $B_{\gamma_{1},...,\gamma_{m}}$ is then formed by taking the tensor product of all vectors $B_{\gamma_{j}}$ , with $B_{\gamma_{j}}$ located at indices $T_{j}$ .

On the one hand, there are

[TABLE]

vectors in this set. On the other hand, they are orthogonal to each other. Indeed, assume that two elements $B_{\gamma_{1},...,\gamma_{m}}$ and $B_{\beta_{1},...,\beta_{m}}$ share a common non-zero coefficient in front of some element $e_{i_{1}}\otimes...\otimes e_{i_{k}}$ . It follows that $T_{j}$ can be defined as the set of indices $t$ such that node $i_{t}$ has type $j$ . But then, by the definition of $B_{\gamma_{j}}$ , the element $\bigotimes_{t\in T_{j}}e_{i_{t}}$ uniquely defines an equivalence class (partition) $\gamma_{j}$ . Hence $\gamma_{j}=\beta_{j}$ for all $j$ .

∎

Proof of Theorem 3.3.

The basis of cyclically invariant $k$ -tensors can be obtained by projecting $k$ -tensors of the form $e_{i_{1}}\otimes...\otimes e_{i_{k}}$ on the invariant subspace using the averaging operator

[TABLE]

It follows that bases and dimensions of cyclically-invariant subspaces in $\mathbb{R}^{n^{k}}$ and over $\mathbb{C}^{n^{k}}$ are the same.

The cyclic action of $C_{n}$ on $\mathbb{C}^{n}$ can be diagonalized, resulting in decomposition

[TABLE]

where the cyclic action on $V_{k}$ is multiplication by $(\sqrt[n]{-1})^{k}$ . Let $d$ be the dimension of the invariant subspace in

[TABLE]

The shift $V_{0}\mapsto V_{1}$ , $V_{1}\mapsto V_{2}$ , …, $V_{n-1}\mapsto V_{0}$ does not change the decomposition but maps the invariant subspace to subspace where cyclic action is multiplication by $\sqrt[n]{-1}$ . It follows that this subspace also has dimension $d$ . Repeating the argument we arrive at $d+d+...+d=n^{k}$ , hence $d=n^{k}/n=n^{k-1}$ . ∎

Proof of Theorem 3.4.

An $n\times n$ image is a $2$ -tensor from $\mathbb{R}^{n}\otimes\mathbb{R}^{n}$ . Vertical translations act on the first entry of a $2$ -tensor while horizontal translations act on the second. The maps $(\mathbb{R}^{n}\otimes\mathbb{R}^{n})^{\otimes k}$ invariant to translations are then computed as

[TABLE]

Since cyclic invariants are computed in Theorem 3.3, the dimension of last tensor product is equal to $n^{k-1}\cdot n^{k-1}=n^{2k-2}$ . ∎

Appendix B Supplementary Lemmas

Lemma B.1.

Let $(a_{n})$ be a sequence and let $f(x)$ be the exponential generating function of that sequence.

[TABLE]

Then $f(x)^{m}$ is an exponential generating function for the sequence

[TABLE]

Proof.

We start by expanding $f(x)^{m}$ :

[TABLE]

where the last step follows from the definition of $b_{k}$ . This shows that $f(x)^{m}$ is an exponential generating function for the sequence $(b_{k})$ . Thus, we have proved Lemma B.1. ∎

Lemma B.2.

The dimension of space of $S_{n_{1}}\times...\times S_{n_{m}}$ -equivariant maps $\mathbb{R}^{n^{k}}\to\mathbb{R}^{n^{d}}$ depends only on $k+d$ .

Proof.

From the point of view of tensor algebra, the computation of $G$ -equivariant layers can be viewed as the computation of $G$ -equivariant linear maps

[TABLE]

Representation theory of symmetric group $S_{n}$ is well-studied. In particular, it is known (Fulton & Harris, 2004) that all characters of $S_{n_{1}}\times...\times S_{n_{m}}$ are real-valued. Hence action on the dual space $V^{*\otimes d}$ is equivalent to the action on the original space $V^{\otimes d}$ . Hence

[TABLE]

This shows that the answer can depend only on $k+d$ . ∎

Appendix C Implementation

Let $K_{1},K_{2},...,K_{m}$ be sets of nodes from groups $1$ to $m$ ,

[TABLE]

Denote by $1_{K_{1}},...,1_{K_{m}}$ the $n$ -dimensional vectors with $1_{K_{i}}$ having coordinate $1$ only at indices $K_{i}$ . And let $I_{K_{i}}$ be an identity matrix with ones only at indices from $K_{i}$ .

An $S_{n_{1}}\times...,\times S_{n_{m}}$ -invariant layer $L:\mathbb{R}^{n}\to\mathbb{R}$ has a form

[TABLE]

where $w_{i}\in\mathbb{R}$ are learnable parameters.

An $S_{n_{1}}\times...,\times S_{n_{m}}$ -equivariant layer $L:\mathbb{R}^{n}\to\mathbb{R}^{n}$ has a form

[TABLE]

where $w_{i,j}$ and $w_{i}$ are learnable parameters.

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Alsentzer et al. (2020) Alsentzer, E., Finlayson, S. G., Li, M. M., and Zitnik, M. Subgraph neural networks. Proceedings of Neural Information Processing Systems, Neur IPS , 2020.
2Bevilacqua et al. (2022) Bevilacqua, B., Frasca, F., Lim, D., Srinivasan, B., Cai, C., Balamurugan, G., Bronstein, M. M., and Maron, H. Equivariant subgraph aggregation networks. In International Conference on Learning Representations , 2022.
3Chen et al. (2019) Chen, Z., Villar, S., Chen, L., and Bruna, J. On the equivalence between graph isomorphism testing and function approximation with gnns. In Advances in Neural Information Processing Systems , volume 32, 2019.
4Fey & Lenssen (2019) Fey, M. and Lenssen, J. E. Fast graph representation learning with pytorch geometric. Ar Xiv , abs/1903.02428, 2019.
5Frasca et al. (2022) Frasca, F., Bevilacqua, B., Bronstein, M. M., and Maron, H. Understanding and extending subgraph GN Ns by rethinking their symmetries. In Advances in Neural Information Processing Systems , 2022.
6Fulton & Harris (2004) Fulton, W. and Harris, J. Representation Theory . Springer New York, 2004.
7Garg et al. (2020) Garg, V. K., Jegelka, S., and Jaakkola, T. Generalization and representational limits of graph neural networks. In Proceedings of the 37th International Conference on Machine Learning , ICML’20, 2020.
8Gasse et al. (2019) Gasse, M., Chételat, D., Ferroni, N., Charlin, L., and Lodi, A. Exact combinatorial optimization with graph convolutional neural networks. In Advances in Neural Information Processing Systems 32 , 2019.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Invariant Layers for Graphs with Nodes of Different Types

Abstract

1 Introduction

2 Preliminaries

3 Classification of Invariant Layers

Theorem 3.1**.**

Theorem 3.2**.**

Theorem 3.3**.**

Theorem 3.4**.**

4 Tensor Sizes for Function Approximation on Graphs

Conjecture 4.1* (A).*

Conjecture 4.2* (B).*

Theorem 4.3**.**

Proof of Theorem 4.3.

5 Experiments

5.1 Real datasets

5.2 Synthetic datasets

6 Conclusion

7 Acknowledgements

Appendix A Proofs of Main Theorems

Proof of Theorem 3.1.

Proof of Theorem 3.2.

Proof of Theorem 3.3.

Proof of Theorem 3.4.

Appendix B Supplementary Lemmas

Lemma B.1**.**

Proof.

Lemma B.2**.**

Proof.

Appendix C Implementation

Theorem 3.1.

Theorem 3.2.

Theorem 3.3.

Theorem 3.4.

*Conjecture 4.1** (A).*

*Conjecture 4.2** (B).*

Theorem 4.3.

Lemma B.1.

Lemma B.2.