Studying Generalization Through Data Averaging
Carlos A. Gomez-Uribe

TL;DR
This paper analyzes how data averaging influences machine learning model generalization, deriving theoretical expressions for the generalization gap and test performance, and empirically validating predictions on CIFAR-10 with ResNet.
Contribution
It introduces new theoretical formulas linking data averaging, parameter covariance, and generalization, and applies these insights to understand SGD behavior.
Findings
Generalization gap is non-negative for many parameter distributions.
Predictions about the impact of SGD noise on generalization are validated empirically.
Test performance depends on data-averaged parameters and loss.
Abstract
The generalization of machine learning models has a complex dependence on the data, model and learning algorithm. We study train and test performance, as well as the generalization gap given by the mean of their difference over different data set samples to understand their ``typical" behavior. We derive an expression for the gap as a function of the covariance between the model parameter distribution and the train loss, and another expression for the average test performance, showing test generalization only depends on data-averaged parameter distribution and the data-averaged loss. We show that for a large class of model parameter distributions a modified generalization gap is always non-negative. By specializing further to parameter distributions produced by stochastic gradient descent (SGD), along with a few approximations and modeling considerations, we are able to predict some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications
MethodsTest · Average Pooling · Convolution · Max Pooling · Batch Normalization · Kaiming Initialization · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Global Average Pooling · 1x1 Convolution
