Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture

Sajad Movahedi; Antonio Orvieto; Seyed-Mohsen Moosavi-Dezfooli

arXiv:2410.12025·cs.LG·May 20, 2025

Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture

Sajad Movahedi, Antonio Orvieto, Seyed-Mohsen Moosavi-Dezfooli

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces the geometric invariance hypothesis (GIH), suggesting neural network input space curvature remains invariant under certain transformations, influencing generalization based on architecture and data geometry.

Contribution

The paper defines the concepts of average geometry and its evolution, linking neural network geometry to data covariance and architecture-dependent invariances.

Findings

01

ResNets exhibit geometry invariance affecting generalization.

02

Geometry evolution is driven by data covariance projected onto the network's average geometry.

03

Architecture-dependent invariances influence neural network generalization.

Abstract

In this paper, we propose the $geometric invariance hypothesis (GIH)$ , which argues that the input space curvature of a neural network remains invariant under transformation in certain architecture-dependent directions during training. We investigate a simple, non-linear binary classification problem residing on a plane in a high dimensional space and observe that $\unicode x 2014$ unlike MLPs $\unicode x 2014$ ResNets fail to generalize depending on the orientation of the plane. Motivated by this example, we define a neural network's $average geometry$ and $average geometry evolution$ as compact $architecture-dependent$ summaries of the model's input-output geometry and its evolution during training. By investigating the average geometry evolution at initialization, we discover that the geometry of a neural network evolves according to the data…

Peer Reviews

Decision·ICLR 2025 Spotlight

Reviewer 01Rating 6Confidence 4

Strengths

1. The introduction of the Geometric Invariance Hypothesis appears novel and extends findings of Neural Anisotropy Directions (Ortiz-Jimenez et al., 2021) to non-linear decision boundaries. This hypothesis has the potential to provide insights into the relationship between neural network architecture and the structure of data, contributing to our understanding of inductive biases in deep learning. 2. The experiments and the theoretical analysis are generally fair, although several imprecisions

Weaknesses

The paper is quite dense. There are multiple points of confusion and imprecisions affecting both clarity and soundness. Specifically: 4. The introduction and main text lack a comprehensive overview of the field and references to related work. Only a few broad papers are cited, despite the extended page limit of this year's edition. I strongly encourage the authors to move much of the discussion from Appendix A.1 into the main text to better place the work in context. In particular, NADs introdu

Reviewer 02Rating 8Confidence 3

Strengths

The paper is able to gradually build up to the main hypothesis being proposed while maintaining a clear chain of reasoning. The authors also provide extensive mathematical proofs for each step in the build-up and mention what assumptions are made and any limitations on what can be shown or derived. Finally, they are able to provide some insight into the effect of this hypothesis on an architecture's generalization ability while addressing any possible ideas with empirical results.

Weaknesses

While the "performance" gains of the paper do seem marginal, I see these experiments as more of a proof of concept of the ideas and the proposed hypothesis. However, it would have been nice to see these experiments on multiple datasets to verify if the claims still hold, especially given the simplicity of the current model choices as well.

Reviewer 03Rating 8Confidence 2

Strengths

The introduction of the geometric invariance hypothesis (GIH) offers a fresh and innovative perspective on the interplay between neural network architectures and the geometry of the input space during training. By proposing the concepts of average geometry and average geometry evolution, the authors provide novel tools for quantifying how different architectures influence learning dynamics. This approach moves beyond traditional analyses by directly linking architectural properties to geometric

Weaknesses

While the paper makes significant contributions, there are areas that could be improved: - Lack of Intuitive Explanation: It is challenging to develop an intuition for why ResNets behave differently from MLPs. Providing more intuitive explanations or illustrative examples before introducing the mathematical formalism would help readers grasp the core concepts and follow the subsequent analysis more effectively. - Limited Architectural Comparison: The focus on ResNets without discussing other ar

Code & Models

Repositories

dr-faustus/gih
pytorchOfficial

Videos

Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture· slideslive

Taxonomy

TopicsNeural Networks and Applications

MethodsAverage Pooling · Global Average Pooling · Convolution · Kaiming Initialization · Max Pooling