A Surprising Linear Relationship Predicts Test Performance in Deep Networks
Qianli Liao, Brando Miranda, Andrzej Banburski, Jack Hidary, Tomaso, Poggio

TL;DR
This paper reveals a surprisingly simple linear relationship between training and test losses in deep networks when accounting for certain loss components, improving understanding of generalization despite identical training errors.
Contribution
It demonstrates how different generalization performances can arise from the same architecture and training error, due to intrinsic properties of the cross-entropy loss and a new loss decomposition.
Findings
A linear relationship between training and test loss emerges after factoring out certain loss components.
Classical generalization bounds are surprisingly tight under this transformed loss.
The empirical relation between classification error and normalized cross-entropy loss is approximately monotonic.
Abstract
Given two networks with the same training loss on a dataset, when would they have drastically different test losses and errors? Better understanding of this question of generalization may improve practical applications of deep networks. In this paper we show that with cross-entropy loss it is surprisingly simple to induce significantly different generalization performances for two networks that have the same architecture, the same meta parameters and the same training error: one can either pretrain the networks with different levels of "corrupted" data or simply initialize the networks with weights of different Gaussian standard deviations. A corollary of recent theoretical results on overfitting shows that these effects are due to an intrinsic problem of measuring test performance with a cross-entropy/exponential-type loss, which can be decomposed into two components both minimized by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Machine Learning in Materials Science
MethodsStochastic Gradient Descent
