Information-theoretic reduction of deep neural networks to linear models   in the overparametrized proportional regime

Francesco Camilli; Daria Tieplova; Eleonora Bergamin; Jean Barbier

arXiv:2505.03577·math.ST·May 7, 2025

Information-theoretic reduction of deep neural networks to linear models in the overparametrized proportional regime

Francesco Camilli, Daria Tieplova, Eleonora Bergamin, Jean Barbier

PDF

Open Access

TL;DR

This paper establishes an information-theoretic equivalence between fully-trained deep neural networks and generalized linear models in the proportional scaling regime, enabling precise calculation of their optimal generalization error.

Contribution

It proves the deep Gaussian equivalence principle, showing neural networks reduce to linear models in the overparametrized proportional regime, and discusses implications for model complexity and data requirements.

Findings

01

Proves the deep Gaussian equivalence principle.

02

Calculates the optimal generalization error for deep neural networks.

03

Shows neural networks trivialize to linear models in the proportional regime.

Abstract

We rigorously analyse fully-trained neural networks of arbitrary depth in the Bayesian optimal setting in the so-called proportional scaling regime where the number of training samples and width of the input and all inner layers diverge proportionally. We prove an information-theoretic equivalence between the Bayesian deep neural network model trained from data generated by a teacher with matching architecture, and a simpler model of optimal inference in a generalized linear model. This equivalence enables us to compute the optimal generalization error for deep neural networks in this regime. We thus prove the "deep Gaussian equivalence principle" conjectured in Cui et al. (2023) (arXiv:2302.00375). Our result highlights that in order to escape this "trivialisation" of deep neural networks (in the sense of reduction to a linear model) happening in the strongly overparametrized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications