Invariant Layers for Graphs with Nodes of Different Types
Dmitry Rybin, Ruoyu Sun, Zhi-Quan Luo

TL;DR
This paper characterizes linear layers invariant to permutations that preserve node types in heterogeneous graphs, enabling more effective learning of node interactions and providing tighter bounds on tensor sizes for function approximation.
Contribution
It fully characterizes invariant linear layers for node-type-preserving permutations and extends Bell number generalizations, improving graph neural network design and tensor size bounds.
Findings
Invariant layers improve learning of node interactions.
Tensor size bounds are tightened from n(n-1)/2 to n.
For image data, tensor generator size is bounded by 2d - 1.
Abstract
Neural networks that satisfy invariance with respect to input permutations have been widely studied in machine learning literature. However, in many applications, only a subset of all input permutations is of interest. For heterogeneous graph data, one can focus on permutations that preserve node types. We fully characterize linear layers invariant to such permutations. We verify experimentally that implementing these layers in graph neural network architectures allows learning important node interactions more effectively than existing techniques. We show that the dimension of space of these layers is given by a generalization of Bell numbers, extending the work (Maron et al., 2019). We further narrow the invariant network design space by addressing a question about the sizes of tensor layers necessary for function approximation on graph data. Our findings suggest that function…
| Tensor sizes | 1 | 2 | 3 |
|---|---|---|---|
| (Maron et al., 2019b) | 1 | 2 | 5 |
| This work |
| Result | Description |
|---|---|
| A decrease of tensor sizes in CNN | |
| Theorem 4.3 | for translation-invariant function |
| approximation from to | |
| A decrease of required tensor sizes | |
| Conjecture A | from to for function |
| approximation on graphs with nodes. | |
| Conjecture B | Graph instance-dependent bound |
| on required tensor sizes. |
| Space | Dimension of Invariant Subspace |
|---|---|
| 1-tensors | |
| 2-tensors | |
| 3-tensors | |
| 4-tensors |
| Task | ppi-bp | hpo-metab |
|---|---|---|
| GLASS | ||
| SubGNN | ||
| GNN-Seg | ||
| This work |
| Task | density | cut ratio |
|---|---|---|
| GLASS | ||
| SubGNN | ||
| GNN-Seg | ||
| This work |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Advanced Graph Neural Networks · Machine Learning in Healthcare
MethodsGraph Neural Network
Invariant Layers for Graphs with Nodes of Different Types
Dmitry Rybin
Ruoyu Sun
Zhi-Quan Luo
Abstract
Neural networks that satisfy invariance with respect to input permutations have been widely studied in machine learning literature. However, in many applications, only a subset of all input permutations is of interest. For heterogeneous graph data, one can focus on permutations that preserve node types. We fully characterize linear layers invariant to such permutations. We verify experimentally that implementing these layers in graph neural network architectures allows learning important node interactions more effectively than existing techniques. We show that the dimension of space of these layers is given by a generalization of Bell numbers, extending the work (Maron et al., 2019b). We further narrow the invariant network design space by addressing a question about the sizes of tensor layers necessary for function approximation on graph data. Our findings suggest that function approximation on a graph with nodes can be done with tensors of sizes , which is tighter than the best-known bound . For image data with translation symmetry, our methods give a tight upper bound (instead of ) on sizes of invariant tensor generators via a surprising connection to Davenport constants.
graph neural network, expressive power, invariant, tensor, permutation
1 Introduction
The study of invariant and equivariant neural networks has been gaining popularity in recent years. Many fundamental properties, such as universal approximation theorems (Maron et al., 2019c), (Yarotsky, 2021; Ravanbakhsh, 2020), have been proved. The design of expressive invariant layers remains an important direction in Deep Learning (Hartford et al., 2018; Kondor & Trivedi, 2018).
Permutation invariant networks are an important special case. In these networks, the symmetry group of the input data is given by all possible permutations of input coordinates. In particular, such symmetry appears in the use of graph neural networks (Kipf & Welling, 2017), where the invariance comes from the permutation of nodes. This symmetry is crucial for architecture design in graph neural networks and the study of their expressive power and universal approximation properties (Chen et al., 2019; Frasca et al., 2022; Garg et al., 2020; Huang et al., 2022; Xu et al., 2019; Bevilacqua et al., 2022; Qian et al., 2022). However, node permutations in homogeneous and heterogeneous graphs have certain differences that received little attention in the literature.
In applications with heterogeneous graphs, the problem background often requires certain groups of nodes to have significantly different properties or represent objects of different nature. Examples of graph applications with many node types can be found in recommendation systems (Wu et al., 2020), chemistry (Reiser et al., 2022), and Learn-to-Optimize (Gasse et al., 2019). Many Graph Neural Network architectures attempt to capture the relations between nodes of different types. Despite many theoretical guarantees (Wang & Zhang, 2022a), some simple features are still hard to learn in practice with existing layers, see Section 5 for experimental evidence. In this paper, we aim to fix this gap by characterizing all invariant linear layers for permutations preserving node types, hence extending the work (Maron et al., 2019b) to heterogeneous graphs.
In Section 3, Theorem 3.1, we provide a complete characterization of linear layers with -tensor input, invariant to permutations from , where is the number of nodes and is the number of different types of nodes. The dimension of the complete space of these layers is given by a generalization of Bell numbers, as can be seen in Table 2. The fact that the structure of invariant layers depends only on the number of node types and tensor size , and does not depend on , is crucial for the re-use of these layers in graph neural networks. It follows that these layers can be directly applied to any input graph with types of nodes, independent of the number of nodes. We provide an explicit orthogonal basis and implementation description for these layers (Theorem 3.2).
A complete characterization of invariant/equivariant tensor layers defines a design space for invariant neural networks. It was discovered (Maron et al., 2019c; Ravanbakhsh, 2020; Keriven & Peyré, 2019; Maron et al., 2019a) that higher-order tensors are necessary for function approximation with invariant/equivariant neural networks. However, explicit bounds on required tensor sizes are needed.
The work (Maron et al., 2019c) showed that -tensors are sufficient for function approximation on graph data with nodes. Lowering this bound is an open question posed in (Maron et al., 2019c) and (Keriven & Peyré, 2019). In Section 4, we provide several theorems and conjectures suggesting a bound below . Furthermore, we show how the structure of orbits of the graph automorphism group determines how tensor sizes can be lowered. Our findings are formulated as conjectures A and B. We summarize the implications of each conjecture in Table 3. The mathematical formulation of the conjectures is provided in Section 4. Some applications, such as image graphs in computer vision, provide special cases that can be analyzed fully. We prove that translation-invariant function approximation on images can be done with tensors of size (Theorem 4.3). The proof makes a surprising connection between translation-invariant tensor layers and Davenport constants (Olson, 1969).
Finally, we discuss the differences between our work and prior results. Due to practical importance, treating different types of nodes has been approached in many applications. For bipartite graphs, half-GNN (Gasse et al., 2019) and EvenNet were proposed (Lei et al., 2022). Layers in half-GNN can be viewed as a special case of ours. For extracting properties of a subset of vertices, subgraph structure extraction (Sub-GNN) (Alsentzer et al., 2020), labeling tricks (GLASS) (Wang & Zhang, 2022b), and other approaches (Sun et al., 2021; You et al., 2021; Huang & Zitnik, 2020) were used. Subgraph data pooling in Sub-GNN and GLASS is an example of a layer invariant only to permutations of nodes within a subgraph and hence is a special case of our layers. Tensor sizes for function approximation with convolutional networks were analyzed in (Yarotsky, 2021). As pointed out in (Yarotsky, 2021), finding a small explicit set of generators of translation-invariant tensors is not trivial. Therefore the work (Yarotsky, 2021) took an alternative approach of averaging the outputs over the whole symmetry group. We bypassed the difficulty by making a change of basis and noting a connection with zero-sum sequences in groups (proof of Theorem 4.3).
2 Preliminaries
A function is invariant to a permutation if for any input we have
[TABLE]
For linear functions this condition is equivalent to a fixed point equations (Maron et al., 2019b), which can be solved by analyzing orbits of indices,
[TABLE]
For tensor input data, , coordinates are indexed by -tuples , where . Permutation now acts on all elements in a tuple, simultaneously permuting indices in all axes of a -tensor. For a linear map , invariance to permutation is equivalent to a fixed point equation
[TABLE]
While for a linear map between -tensors and -tensors , equivariance to permutation is stated as
[TABLE]
In the fundamental work (Maron et al., 2019b), linear maps invariant to all possible permutations were explicitly classified. The dimension of space of these maps was shown to be given by Bell numbers . This classification is important since it provides a complete design space for invariant and equivariant neural networks.
Recall that a standard model for the invariant neural network is a function
[TABLE]
defined as
[TABLE]
where are linear equivariant layers (quantity is the number of filters or channels in layer ), is an activation function (such as ReLU or sigmoid), is an invariant layer, and is a multi-layer perceptron. Here layers are equivariant to a predefined set of permutations . While layer is invariant to that pre-defined set of permutations. For our applications we consider all permutations from , as explained below.
Consider a graph with nodes of different types: nodes of type , nodes of type , …., nodes of type , (see Figures 1 and 2). For general graph-level tasks aggregation is performed over all nodes equivalently. Such aggregation operation guarantees invariance of the output to all permutations from . However, to capture properties shared by one type of node, it is viable to use aggregations only within nodes of the same type. Such aggregations would only preserve symmetries from a subgroup that permutes nodes of the same type, .
For a special case of two node types, orthogonal bases of the new invariant linear layers are illustrated in Figures 3 and 4. A map is represented by a -tensor. In particular, a map in Figure 3 is given by a matrix. Assume that is a set of nodes of type for . Then the six maps in the Figure 3 are
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Note that relations such as “number of edges between nodes of type and ” are easier to capture with this set of maps.
Graph data can be encoded using tensors. Node features are 1-tensors, while edge features are 2-tensors. Higher-order tensors correspond to hypergraph data such as hyper-edges (Maron et al., 2019b), or non-trivial structures such as bags of rooted sub-graphs (Frasca et al., 2022). Symmetries preserved by those tensors are related to the symmetries of the underlying graph.
Let be a graph with vertices, and be a feature vector . Where is a scalar feature of node . If is a graph function,
[TABLE]
then by definition, must respect graph symmetries, i.e.
[TABLE]
Here is a symmetry (automorphism) group of . It is defined as a subgroup of all node permutations that preserve the graph structure
[TABLE]
For example, for the graph in Figure 5, any graph function must be invariant to all permutations of variables . The reason is that these permutations produce the same graph structure, and hence any function that depends only on graph structure must give the same output after such permutation.
For a general subgroup of , the function approximation properties of tensor layers are discussed in (Maron et al., 2019c). However, the preservation of graph structure puts a significant restriction on the subgroup . This restriction can lead to a bound that is significantly smaller than .
For a given graph with nodes define an integer as the smallest size of tensors that allows invariant neural network to approximate any -invariant function . That is, for any continuous -invariant function , any compact set and any there should exist an invariant neural network with tensor sizes such that
[TABLE]
Open Question: Can the bound be improved?
We note that the work (Maron et al., 2019c) connected the quantity to degrees of polynomial generators of ring of invariants .
3 Classification of Invariant Layers
Recall that a linear map can be written as
[TABLE]
The condition that a map is invariant to a subgroup of permutations is equivalent to a set of fixed points equations
[TABLE]
[TABLE]
Solving these equations can be reduced to a certain technical calculation from the branch of invariant theory and representation theory (Fulton & Harris, 2004). We provide this calculation in appendix A for permutations preserving node types (Theorem 3.1), cyclic shifts (Theorem 3.3), and translations (Theorem 3.4).
Theorem 3.1**.**
If a graph contains nodes of different types, then the dimension of space of invariant layers is given by the coefficient in front of in the expression
[TABLE]
[TABLE]
The dimension of space of equivariant layers is given by the coefficient in front of .
We provide an explicit basis for classified invariant layers.
Theorem 3.2**.**
An orthogonal basis in space of -invariant tensor layers is given by the following set. For every disjoint partition
[TABLE]
and for every tuple , where each is a basis vector in space of -invariant layers , as in Theorem 1 from (Maron et al., 2019b), form a vector
[TABLE]
by setting the coefficient in front of to if and only if the coefficient in front of in is for all . Equivalently, is a tensor product of when put at appropriate indices.
To illustrate the generality of the methods we develop, we also provide the results for cyclic permutations and translations. The proofs of the following theorems can be found in Appendix A.
Theorem 3.3**.**
The dimension of space of cyclically invariant maps is equal to .
Theorem 3.4**.**
The dimension of space of translation-invariant maps is equal to . Here is the space of images with the action of translation group .
4 Tensor Sizes for Function Approximation on Graphs
In the following section, we discuss an open question about function approximation on graphs. The question was raised in (Maron et al., 2019c) and (Keriven & Peyré, 2019).
Given a subgroup , there is a construction of an -invariant neural network that uses tensors of size up to and achieves universal approximation property for -invariant functions, see Theorem 3 in (Maron et al., 2019c). When implementing a layer with tensors of size , the number of neurons reaches . Such scale is impractical, and further optimization of tensor sizes is needed.
Recall that an integer is defined as the smallest bound on tensor sizes that allow continuous function approximation on graphs using Invariant Neural Network. We propose the following conjectures.
Conjecture 4.1* (A).*
For a graph with nodes, we have .
Conjecture 4.2* (B).*
The value is upper bounded by the maximal size of orbit.
We experimentally verified that these conjectures hold for all graphs with .
We illustrate the relevance of these conjectures with a well-known application - universal approximation of translation-invariant functions with convolutional neural networks (Zhou, 2018). Image classes are often assumed to be translation-invariant functions. If the input data is given by images, then the translation group is a product of cyclic groups (horizontal and vertical shifts). Note that we ignore rotations and reflections for simplicity.
Theorem 4.3**.**
Invariant neural networks with tensor layers of size up to can approximate any translation-invariant continuous function on image data.
We provide the proof of the Theorem 4.3 below. The first key step in the proof is a change of basis that diagonalizes the action of the commutative group . The second key step is the connection of invariance to the notion of zero-sum sequences and the Davenport constant of a group, an idea that seems new in the machine learning literature.
Proof of Theorem 4.3.
The group of translations acting on images can be viewed as a subgroup of . By Theorem 1 from (Maron et al., 2019c), an invariant neural network can approximate any function invariant to . The same work shows an upper bound on tensor sizes required for -invariant function approximation. Let us show that only tensors of size up to are required. From the proof of Theorem 1 in (Maron et al., 2019c) we know that it suffices to approximate generators in the ring of invariant polynomials. The ring in question has variables , and the action of performs cyclic shifts on first and second indices of variables. The invariants of this action are not easy to analyze in basis . We propose a change of basis that diagonalizes this action.
Define a new basis as follows
[TABLE]
Then the action of the translation on is multiplication by , i.e. multiplication by a root of unity.
If a polynomial in variables is invariant to the action of , then every its monomial
[TABLE]
is also invariant, i.e.
[TABLE]
This relation is a zero-sum relation on a sequence of length that contains elements from the group . We conclude that finding invariant tensor layers for Convolutional Neural Networks is equivalent to finding zero-sum sequences in the group .
Davenport constant of a group is defined as the maximal length of a sequence of elements from that contain no zero-sum subsequence. Davenport constant of the group was computed before (Olson, 1969) and is equal to . It follows that any sequence of length contains a non-empty zero-sum subsequence. In terms of invariant monomials, it means that any invariant monomial of degree and above can be written as a product of invariant monomials of degrees . Hence all generators of the ring of -invariants have degrees . By a connection established in the works (Yarotsky, 2021), (Maron et al., 2019c), we conclude that is an upper bound on the required tensor size in Convolutional Neural Networks. ∎
5 Experiments
To support the claim that our invariant/equivariant layer design improves learning on graphs with different node types, we consider open benchmarks with tasks that require learning interactions between groups of nodes, such as subgraph tasks. We compare our models to three recent architectures that achieved state-of-the-art or close results: SubGNN (Alsentzer et al., 2020), GLASS (Wang & Zhang, 2022a), GNN-Seg (treating a single group of nodes while ignoring the rest of the graph).
The training process in all experiments uses Adam optimizer (Kingma & Ba, 2014) and ReduceLROnPlateau learning rate scheduler. The number of iterations in training is bounded by 10000, and early stopping is performed based on a non-increase of the validation data score for 1000 iterations. The models were implemented with Pytorch (Fey & Lenssen, 2019).
5.1 Real datasets
We evaluate the model performance on four real-world datasets with two node types: ppi-bp, em-user, hpo-metab, hpo-neuro, with 80:10:10 training, validation, and test split.
We use the node labeling trick with Message Passing Neural Network architecture similar to GLASS. We add two more layers: -equivariant layer and -invariant layer instead of sum or average pooling.
The proposed model achieves close to state-of-the-art results on 3 out of 4 datasets. We note that the model variance is noticeably higher. One possible explanation is the high sensitivity caused by added learnable mappings.
5.2 Synthetic datasets
We use four synthetic datasets introduced in (Alsentzer et al., 2020): density, cut ratio, coreness, and component. We follow the 50:25:25 training, validation, and test split as in (Alsentzer et al., 2020). Our model for synthetic data follows an invariant neural network architecture. In particular, we use three equivariant layers, followed by an invariant pool layer and Muli-Layer-Perceptron. The vector-form implementation of the used layers is given in Appendix C.
We compare the performance of the state-of-the-art models and our model on these tasks, see table 6. The proposed model achieves state-of-the-art or similar results on 4 out of 4 synthetic datasets.
6 Conclusion
In this work, we presented a complete classification of linear tensor layers invariant to permutations of nodes of the same type. We experimentally verified the performance improvement these layers show on real and synthetic tasks. New steps have been made to further bound the size of tensors required for function approximation on graph data. In particular, when treating image data as graph data, we obtained tight bounds on the sizes of invariant convolutional tensor layers.
7 Acknowledgements
The work of Z.-Q. Luo was supported in part by the National Key Research and Development Project under grant 2022YFA1003900, and in part by the Guangdong Provincial Key Laboratory of Big Data Computing.
Appendix A Proofs of Main Theorems
Proof of Theorem 3.1.
Decompose the space into a direct sum of subspaces where permutations act,
[TABLE]
Rewrite the tensor product
[TABLE]
into multinomial sum, see (Fulton & Harris, 2004),
[TABLE]
For example,
[TABLE]
Using the result of (Maron et al., 2019b), we note that the dimension of invariants of is equal to . Hence the dimension of invariants is given by the sum
[TABLE]
The expression above is known in the theory of exponential generating functions (Stanley, 2011), see Lemma B.1. we conclude that this sum appears as a coefficient in front of in the series
[TABLE]
For the claim about equivariant maps see Lemma B.2. ∎
Proof of Theorem 3.2.
Consider the set of vectors in obtained by the following procedure
For each node type select a subset of of indices from , in such a way that
[TABLE] 2. 2.
For each subspace we select a basis element , where is a partition of , according to a construction of the basis in (Maron et al., 2019b). The basis element is then formed by taking the tensor product of all vectors , with located at indices .
On the one hand, there are
[TABLE]
vectors in this set. On the other hand, they are orthogonal to each other. Indeed, assume that two elements and share a common non-zero coefficient in front of some element . It follows that can be defined as the set of indices such that node has type . But then, by the definition of , the element uniquely defines an equivalence class (partition) . Hence for all .
∎
Proof of Theorem 3.3.
The basis of cyclically invariant -tensors can be obtained by projecting -tensors of the form on the invariant subspace using the averaging operator
[TABLE]
It follows that bases and dimensions of cyclically-invariant subspaces in and over are the same.
The cyclic action of on can be diagonalized, resulting in decomposition
[TABLE]
where the cyclic action on is multiplication by . Let be the dimension of the invariant subspace in
[TABLE]
The shift , , …, does not change the decomposition but maps the invariant subspace to subspace where cyclic action is multiplication by . It follows that this subspace also has dimension . Repeating the argument we arrive at , hence . ∎
Proof of Theorem 3.4.
An image is a -tensor from . Vertical translations act on the first entry of a -tensor while horizontal translations act on the second. The maps invariant to translations are then computed as
[TABLE]
Since cyclic invariants are computed in Theorem 3.3, the dimension of last tensor product is equal to . ∎
Appendix B Supplementary Lemmas
Lemma B.1**.**
Let be a sequence and let be the exponential generating function of that sequence.
[TABLE]
Then is an exponential generating function for the sequence
[TABLE]
Proof.
We start by expanding :
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
where the last step follows from the definition of . This shows that is an exponential generating function for the sequence . Thus, we have proved Lemma B.1. ∎
Lemma B.2**.**
The dimension of space of -equivariant maps depends only on .
Proof.
From the point of view of tensor algebra, the computation of -equivariant layers can be viewed as the computation of -equivariant linear maps
[TABLE]
Representation theory of symmetric group is well-studied. In particular, it is known (Fulton & Harris, 2004) that all characters of are real-valued. Hence action on the dual space is equivalent to the action on the original space . Hence
[TABLE]
This shows that the answer can depend only on . ∎
Appendix C Implementation
Let be sets of nodes from groups to ,
[TABLE]
Denote by the -dimensional vectors with having coordinate only at indices . And let be an identity matrix with ones only at indices from .
An -invariant layer has a form
[TABLE]
where are learnable parameters.
An -equivariant layer has a form
[TABLE]
where and are learnable parameters.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Alsentzer et al. (2020) Alsentzer, E., Finlayson, S. G., Li, M. M., and Zitnik, M. Subgraph neural networks. Proceedings of Neural Information Processing Systems, Neur IPS , 2020.
- 2Bevilacqua et al. (2022) Bevilacqua, B., Frasca, F., Lim, D., Srinivasan, B., Cai, C., Balamurugan, G., Bronstein, M. M., and Maron, H. Equivariant subgraph aggregation networks. In International Conference on Learning Representations , 2022.
- 3Chen et al. (2019) Chen, Z., Villar, S., Chen, L., and Bruna, J. On the equivalence between graph isomorphism testing and function approximation with gnns. In Advances in Neural Information Processing Systems , volume 32, 2019.
- 4Fey & Lenssen (2019) Fey, M. and Lenssen, J. E. Fast graph representation learning with pytorch geometric. Ar Xiv , abs/1903.02428, 2019.
- 5Frasca et al. (2022) Frasca, F., Bevilacqua, B., Bronstein, M. M., and Maron, H. Understanding and extending subgraph GN Ns by rethinking their symmetries. In Advances in Neural Information Processing Systems , 2022.
- 6Fulton & Harris (2004) Fulton, W. and Harris, J. Representation Theory . Springer New York, 2004.
- 7Garg et al. (2020) Garg, V. K., Jegelka, S., and Jaakkola, T. Generalization and representational limits of graph neural networks. In Proceedings of the 37th International Conference on Machine Learning , ICML’20, 2020.
- 8Gasse et al. (2019) Gasse, M., Chételat, D., Ferroni, N., Charlin, L., and Lodi, A. Exact combinatorial optimization with graph convolutional neural networks. In Advances in Neural Information Processing Systems 32 , 2019.
