Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
Sajad Movahedi, Antonio Orvieto, Seyed-Mohsen Moosavi-Dezfooli

TL;DR
This paper introduces the geometric invariance hypothesis (GIH), suggesting neural network input space curvature remains invariant under certain transformations, influencing generalization based on architecture and data geometry.
Contribution
The paper defines the concepts of average geometry and its evolution, linking neural network geometry to data covariance and architecture-dependent invariances.
Findings
ResNets exhibit geometry invariance affecting generalization.
Geometry evolution is driven by data covariance projected onto the network's average geometry.
Architecture-dependent invariances influence neural network generalization.
Abstract
In this paper, we propose the , which argues that the input space curvature of a neural network remains invariant under transformation in certain architecture-dependent directions during training. We investigate a simple, non-linear binary classification problem residing on a plane in a high dimensional space and observe thatunlike MLPsResNets fail to generalize depending on the orientation of the plane. Motivated by this example, we define a neural network's and as compact summaries of the model's input-output geometry and its evolution during training. By investigating the average geometry evolution at initialization, we discover that the geometry of a neural network evolves according to the data…
Peer Reviews
Decision·ICLR 2025 Spotlight
1. The introduction of the Geometric Invariance Hypothesis appears novel and extends findings of Neural Anisotropy Directions (Ortiz-Jimenez et al., 2021) to non-linear decision boundaries. This hypothesis has the potential to provide insights into the relationship between neural network architecture and the structure of data, contributing to our understanding of inductive biases in deep learning. 2. The experiments and the theoretical analysis are generally fair, although several imprecisions
The paper is quite dense. There are multiple points of confusion and imprecisions affecting both clarity and soundness. Specifically: 4. The introduction and main text lack a comprehensive overview of the field and references to related work. Only a few broad papers are cited, despite the extended page limit of this year's edition. I strongly encourage the authors to move much of the discussion from Appendix A.1 into the main text to better place the work in context. In particular, NADs introdu
The paper is able to gradually build up to the main hypothesis being proposed while maintaining a clear chain of reasoning. The authors also provide extensive mathematical proofs for each step in the build-up and mention what assumptions are made and any limitations on what can be shown or derived. Finally, they are able to provide some insight into the effect of this hypothesis on an architecture's generalization ability while addressing any possible ideas with empirical results.
While the "performance" gains of the paper do seem marginal, I see these experiments as more of a proof of concept of the ideas and the proposed hypothesis. However, it would have been nice to see these experiments on multiple datasets to verify if the claims still hold, especially given the simplicity of the current model choices as well.
The introduction of the geometric invariance hypothesis (GIH) offers a fresh and innovative perspective on the interplay between neural network architectures and the geometry of the input space during training. By proposing the concepts of average geometry and average geometry evolution, the authors provide novel tools for quantifying how different architectures influence learning dynamics. This approach moves beyond traditional analyses by directly linking architectural properties to geometric
While the paper makes significant contributions, there are areas that could be improved: - Lack of Intuitive Explanation: It is challenging to develop an intuition for why ResNets behave differently from MLPs. Providing more intuitive explanations or illustrative examples before introducing the mathematical formalism would help readers grasp the core concepts and follow the subsequent analysis more effectively. - Limited Architectural Comparison: The focus on ResNets without discussing other ar
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications
MethodsAverage Pooling · Global Average Pooling · Convolution · Kaiming Initialization · Max Pooling
