Approximate equivariance via projection-based regularisation
Torben Berndt, Jan St\"uhmer

TL;DR
This paper introduces a projection-based regulariser for neural networks that enforces approximate equivariance by penalising non-equivariance at the operator level, improving efficiency and performance over existing sample-based methods.
Contribution
It proposes a novel projection-based regulariser leveraging linear layer decomposition, offering an exact, efficient way to enforce approximate equivariance across the full group orbit.
Findings
Outperforms prior methods in model accuracy.
Achieves significant runtime improvements.
Provides a mathematically grounded framework for non-equivariance penalty.
Abstract
Equivariance is a powerful inductive bias in neural networks, improving generalisation and physical consistency. Recently, however, non-equivariant models have regained attention, due to their better runtime performance and imperfect symmetries that might arise in real-world applications. This has motivated the development of approximately equivariant models that strike a middle ground between respecting symmetries and fitting the data distribution. Existing approaches in this field usually apply sample-based regularisers which depend on data augmentation at training time, incurring a high sample complexity, in particular for continuous groups such as . This work instead approaches approximate equivariance via a projection-based regulariser which leverages the orthogonal decomposition of linear layers into equivariant and non-equivariant components. In contrast to existing…
Peer Reviews
Decision·Submitted to ICLR 2026
- The paper provides a clear formulation of the problem that unifies the equivariance and approximate equivariance under a single operator-level framework. By expressing the regularization in the space of model parameters, the method is independent of input augmentations and data sampling strategies. - The authors, in addition to proposing the equivariant regularizer, also provide an efficient way to implement it in the Fourier space, allowing for easier adaptation to current pipelines that are
- There is limited discussion regarding the effect of the introduced hyperparameters $\lambda_G$ ,$\lambda_\perp$ . Since there are two different regularizers that interact both with each other and with the task loss, it would be interesting to see how much the performance gains are sensitive to their values. - While the authors provide a clear motivation for the $\lVert T-P(T)\rVert$ regularizer, they do not provide any significant insight regarding the $\rVert P(T)\rVert$ term. - There is no d
The method basically uses an idea from Elesedy and Zaidi, but it is conceptually clear and simple. It recasts approximate equivariance as operator-space projection instead of sample-space penalties, aligning with representation theory. The theory is generally nice: It shows equivalence (up to constants) between the projection distance and equivariance defect; this provides per-layer bounds for networks. Using the Fourier picture gives a clear (and broadly applicable), implementable "mask-and-a
In the beginning of the paper efficiency and complexity are used as motivational devices, but their scope is not fully quantified. The efficiency claims lack explicit computational complexity and per-epoch wall-clock comparisons across all settings. The experimental tables sometimes lack explicit metric definition/units and consistent uncertainty reporting (e.g. the dynamics table). Makes it somewhat hard to parse. The breadth of baselines in CT MAR is limited: Only one sample-based method comp
The paper is well written and mostly easy to read, despite being rather strict in its formal treatment of the subject matter (I should state that I know the underlying theory only from an application/engineering perspective; I am not an expert in representations of Lie groups or harmonic analysis; so all of my analysis should be read with this caveat in mind). The basic idea sounds rather straightforward and easy to understand (equivariant maps are a subspace of all linear maps, so one can just
My impression is that performance is one of the highlighted positives of the method, but it seems to me that the method is still costly. In general, networks employing equivariant linear layers are expensive, as they basically have to sample the whole transformation group with their filters. In the case of this paper, some additional costs occur on the side of parameters: The off-diagonal elements of the linear layer introduce additional degrees of freedom that need to be represented before they
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques
