How many samples are needed to train a deep neural network?

Pegah Golestaneh; Mahsa Taheri; Johannes Lederer

arXiv:2405.16696·math.ST·August 27, 2025·ICLR·1 cites

How many samples are needed to train a deep neural network?

Pegah Golestaneh, Mahsa Taheri, Johannes Lederer

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper investigates the data requirements for training ReLU neural networks, revealing that their generalization error decreases at a rate of 1/√n, indicating they need large datasets for effective learning.

Contribution

It provides theoretical and empirical evidence that ReLU neural networks' generalization error scales with 1/√n, challenging the assumption of parametric rate convergence.

Findings

01

Generalization error scales as 1/√n

02

Neural networks require large datasets for good performance

03

Empirical results support theoretical predictions

Abstract

Neural networks have become standard tools in many areas, yet many important statistical questions remain open. This paper studies the question of how much data are needed to train a ReLU feed-forward neural network. Our theoretical and empirical results suggest that the generalization error of ReLU feed-forward neural networks scales at the rate $1/ n$ in the sample size $n$ rather than the usual "parametric rate" $1/ n$ . Thus, broadly speaking, our results underpin the common belief that neural networks need "many" training samples.

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

# Originality This paper attempts to find a lower bound for the minimax risk, which is not common in the literature. # Clarity The paper is easy to ready. # Quality The proof seems to be sound (I did not check everything).

Weaknesses

# Originality The paper uses almost the same tools as in [1], including the $\mathcal{L}^1$ regularization (which becomes the $\mathcal{L}^1$ ball in the space of parameters). The covering number is replaced with the packing number. # Clarity The motivation of finding such a lower bound is unclear: why would it be useful to get a lower bound on the minimax risk? What kind of information does it provide on the dataset, the function to approximate and the NN architecture? Except it fits the fr

Reviewer 02Rating 8Confidence 3

Strengths

The minimax lower bound for deep ReLU networks and the lower bound for the packing number of deep ReLU neural networks is novel and interesting.

Weaknesses

*

Reviewer 03Rating 6Confidence 3

Strengths

- The problem of understanding the sample complexity of learning neural networks is an Important research topic. - The research presents a minimax lower bound for deep neural networks, which seems to be new.

Weaknesses

- It is not very clear to me why one should compare the results with $1/n$ rate. Based on my understanding, to achieve $1/n$ rate one often needs more structural assumptions on the target function, such as the ground-truth is spare in spare linear regression setting. Given that there is no such structural assumption in the paper, I’m not sure why one would expect such $1/n$ rate to happen. - The considered function space is elementwise $\ell_1$ norm bounded, which is not a very commonly consider

Videos

How many samples are needed to train a deep neural network?· slideslive

Taxonomy

TopicsNeural Networks and Applications