Separation Power of Equivariant Neural Networks
Marco Pacini, Xiaowen Dong, Bruno Lepri, Gabriele Santin

TL;DR
This paper thoroughly characterizes the separation power of equivariant neural networks, revealing how architectural choices and hyperparameters influence their ability to distinguish inputs, with implications for model expressivity and design.
Contribution
It provides a complete characterization of indistinguishable inputs for equivariant networks and analyzes how hyperparameters affect their separation power, including the impact of activation functions and architecture.
Findings
All non-polynomial activations have equivalent maximum separation power.
Depth increases separation power up to a certain threshold, then plateaus.
Adding invariant features does not affect separation power.
Abstract
The separation power of a machine learning model refers to its ability to distinguish between different inputs and is often used as a proxy for its expressivity. Indeed, knowing the separation power of a family of models is a necessary condition to obtain fine-grained universality results. In this paper, we analyze the separation power of equivariant neural networks, such as convolutional and permutation-invariant networks. We first present a complete characterization of inputs indistinguishable by models derived by a given architecture. From this results, we derive how separability is influenced by hyperparameters and architectural choices-such as activation functions, depth, hidden layer width, and representation types. Notably, all non-polynomial activations, including ReLU and sigmoid, are equivalent in expressivity and reach maximum separation power. Depth improves separation power…
Peer Reviews
Decision·ICLR 2025 Poster
* This paper is well-written and easy to follow. * This paper considers an interesting and important problem in machine learning area and extends our understanding of separation power of specific equivariant architectures. * This paper's definitions and statments are clear, and its theoretical analysis is also solid.
* The theoretical results in paper are interesting but not so supervising, thus it seems unable to improve equivariant network design. * The setting in this paper is somehow too specific (finite group, and feedforward architectures with element-wise activation if I understand correctly), which may limit its contribution. More detail can be found in Questions.
The strengths of the paper are numerous. The authors provide a novel theoretical framework for analyzing separation power in equivariant networks, creatively combining group theory and functional analysis. Particularly interesting is the twin network trick which transforms a network separation problem into a zero locus problem for neural networks, allowing the application of recursive techniques for solving zero locus problems. The authors also provide new insights into how architectural choices
The authors clearly state the limitations of the work. An obvious weakness is the computational complexity, but the goal of this paper is to build theoretical frameworks which is a very challenging task. It is understandable that the computational complexity is not practical. Though efficient computational approximates could be explored. The paper could also benefit from experiments showing the effect (or lack thereof) of depth on separation power, as well as the impact of different activation
**Clarity:** The work is clearly written and relatively crisp. The Preliminaries section does a decent job introducing the basic tools needed in the paper (e.g., the notion of equivariance, etc.). An explanation of how the work fits into the current research landscape is provided. Despite being a theory paper, this work should be accessible to a wide range of readers. **Problem importance:** While there is now a substantial body of work on equivariant architectures, both in terms of architectur
**Theorem 1 is hard to read:** As this is one of the critical contributions of the work, it would be worth polishing it to make it more readily understandable, especially since the work takes pains in Section 4 to layout the framework for this result. It feels like in the process of trying to make the theorem more “informal” all the explanations were taken out but the mathematical notation was retained. For instance, what is the symbol “\leq” at line 377? The reviewer can make a guess, but it wo
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
