Ensembles provably learn equivariance through data augmentation

Oskar Nordenfors; Axel Flinth

arXiv:2410.01452·cs.LG·December 19, 2025

Ensembles provably learn equivariance through data augmentation

Oskar Nordenfors, Axel Flinth

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper proves that ensembles of neural networks can learn equivariance through data augmentation without relying on the neural tangent kernel limit, extending previous results to stochastic settings and general architectures.

Contribution

It demonstrates that equivariance emergence in ensembles is independent of the neural tangent kernel limit and applies to broader architectures and stochastic scenarios.

Findings

01

Equivariance emerges in ensembles without NTK assumptions

02

Results hold for stochastic and general architectures

03

Validated through numerical experiments

Abstract

Recently, it was proved that group equivariance emerges in ensembles of neural networks as the result of full augmentation in the limit of infinitely wide neural networks (neural tangent kernel limit). In this paper, we extend this result significantly. We provide a proof that this emergence does not depend on the neural tangent kernel limit at all. We also consider stochastic settings, and furthermore general architectures. For the latter, we provide a simple sufficient condition on the relation between the architecture and the action of the group for our results to hold. We validate our findings through simple numeric experiments.

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 4

Strengths

- The work show the emergence of equivariant in ensemble models - The work generalizes previous works where the proof relied on NTKs - Experiments with large ensemble of models show the emergence of equivariance

Weaknesses

I have several concerns over the usefulness of the theory and the experimental results. Usefulness of theory: - What is the use of the theory in model design or practical use cases? Since equivariant models seems to give perfect equivariance and data augmentation techniques give approximate equivariance. So, I am wondering what is the use of ensemble technique for symmetries, especially, given that we need over 1000 models to get good equivariant results. - What are the advantages of the propos

Reviewer 02Rating 6Confidence 3

Strengths

- It generalizes the results in Gerken & Kessel - The topic of invariance/equivariance is important so these results would be of interest to people in that community

Weaknesses

My main issue is with the writing: - The results presented in the main text are quite trivial, that if you start with an invariant distribution and use an invariant flow you end up with an invariant distribution. The more interesting results are in the appendix (appendix B and C) - You writing $\mathcal{L} = A_\mathcal{L} + T\mathcal{L}$ with $T\mathcal{L}$ the tangent space is very confusing, as tangent space is defined for a manifold and we are talking about a linear space. It needlessly comp

Reviewer 03Rating 6Confidence 3

Strengths

1. The paper is well-structured and easy to follow. 1. The paper extends previous results to more reasonable and applicable settings. This is a significant extension.

Weaknesses

I like the paper and believe it has a sufficient contribution and interesting results. However, there are several limitations stated below: 1. While the assumptions for the theoretical analysis are more applicable compared to previous works, they still hold only for infinite-size ensembles. Any analysis (including empirical) on the error bounds for finite ensembles would be beneficial. 1. While the results are important, the novelty is somewhat moderate in the sense that the emergent equivarian

Code & Models

Repositories

onordenfors/ensemble_experiment
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Physics and Python Applications · Machine Learning in Healthcare · Time Series Analysis and Forecasting