How many samples are needed to train a deep neural network?
Pegah Golestaneh, Mahsa Taheri, Johannes Lederer

TL;DR
This paper investigates the data requirements for training ReLU neural networks, revealing that their generalization error decreases at a rate of 1/√n, indicating they need large datasets for effective learning.
Contribution
It provides theoretical and empirical evidence that ReLU neural networks' generalization error scales with 1/√n, challenging the assumption of parametric rate convergence.
Findings
Generalization error scales as 1/√n
Neural networks require large datasets for good performance
Empirical results support theoretical predictions
Abstract
Neural networks have become standard tools in many areas, yet many important statistical questions remain open. This paper studies the question of how much data are needed to train a ReLU feed-forward neural network. Our theoretical and empirical results suggest that the generalization error of ReLU feed-forward neural networks scales at the rate in the sample size rather than the usual "parametric rate" . Thus, broadly speaking, our results underpin the common belief that neural networks need "many" training samples.
Peer Reviews
Decision·ICLR 2025 Poster
# Originality This paper attempts to find a lower bound for the minimax risk, which is not common in the literature. # Clarity The paper is easy to ready. # Quality The proof seems to be sound (I did not check everything).
# Originality The paper uses almost the same tools as in [1], including the $\mathcal{L}^1$ regularization (which becomes the $\mathcal{L}^1$ ball in the space of parameters). The covering number is replaced with the packing number. # Clarity The motivation of finding such a lower bound is unclear: why would it be useful to get a lower bound on the minimax risk? What kind of information does it provide on the dataset, the function to approximate and the NN architecture? Except it fits the fr
The minimax lower bound for deep ReLU networks and the lower bound for the packing number of deep ReLU neural networks is novel and interesting.
*
- The problem of understanding the sample complexity of learning neural networks is an Important research topic. - The research presents a minimax lower bound for deep neural networks, which seems to be new.
- It is not very clear to me why one should compare the results with $1/n$ rate. Based on my understanding, to achieve $1/n$ rate one often needs more structural assumptions on the target function, such as the ground-truth is spare in spare linear regression setting. Given that there is no such structural assumption in the paper, I’m not sure why one would expect such $1/n$ rate to happen. - The considered function space is elementwise $\ell_1$ norm bounded, which is not a very commonly consider
Videos
Taxonomy
TopicsNeural Networks and Applications
