Ensembles provably learn equivariance through data augmentation
Oskar Nordenfors, Axel Flinth

TL;DR
This paper proves that ensembles of neural networks can learn equivariance through data augmentation without relying on the neural tangent kernel limit, extending previous results to stochastic settings and general architectures.
Contribution
It demonstrates that equivariance emergence in ensembles is independent of the neural tangent kernel limit and applies to broader architectures and stochastic scenarios.
Findings
Equivariance emerges in ensembles without NTK assumptions
Results hold for stochastic and general architectures
Validated through numerical experiments
Abstract
Recently, it was proved that group equivariance emerges in ensembles of neural networks as the result of full augmentation in the limit of infinitely wide neural networks (neural tangent kernel limit). In this paper, we extend this result significantly. We provide a proof that this emergence does not depend on the neural tangent kernel limit at all. We also consider stochastic settings, and furthermore general architectures. For the latter, we provide a simple sufficient condition on the relation between the architecture and the action of the group for our results to hold. We validate our findings through simple numeric experiments.
Peer Reviews
Decision·Submitted to ICLR 2025
- The work show the emergence of equivariant in ensemble models - The work generalizes previous works where the proof relied on NTKs - Experiments with large ensemble of models show the emergence of equivariance
I have several concerns over the usefulness of the theory and the experimental results. Usefulness of theory: - What is the use of the theory in model design or practical use cases? Since equivariant models seems to give perfect equivariance and data augmentation techniques give approximate equivariance. So, I am wondering what is the use of ensemble technique for symmetries, especially, given that we need over 1000 models to get good equivariant results. - What are the advantages of the propos
- It generalizes the results in Gerken & Kessel - The topic of invariance/equivariance is important so these results would be of interest to people in that community
My main issue is with the writing: - The results presented in the main text are quite trivial, that if you start with an invariant distribution and use an invariant flow you end up with an invariant distribution. The more interesting results are in the appendix (appendix B and C) - You writing $\mathcal{L} = A_\mathcal{L} + T\mathcal{L}$ with $T\mathcal{L}$ the tangent space is very confusing, as tangent space is defined for a manifold and we are talking about a linear space. It needlessly comp
1. The paper is well-structured and easy to follow. 1. The paper extends previous results to more reasonable and applicable settings. This is a significant extension.
I like the paper and believe it has a sufficient contribution and interesting results. However, there are several limitations stated below: 1. While the assumptions for the theoretical analysis are more applicable compared to previous works, they still hold only for infinite-size ensembles. Any analysis (including empirical) on the error bounds for finite ensembles would be beneficial. 1. While the results are important, the novelty is somewhat moderate in the sense that the emergent equivarian
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Machine Learning in Healthcare · Time Series Analysis and Forecasting
