A Surprising Linear Relationship Predicts Test Performance in Deep   Networks

Qianli Liao; Brando Miranda; Andrzej Banburski; Jack Hidary; Tomaso; Poggio

arXiv:1807.09659·cs.LG·July 26, 2018·21 cites

A Surprising Linear Relationship Predicts Test Performance in Deep Networks

Qianli Liao, Brando Miranda, Andrzej Banburski, Jack Hidary, Tomaso, Poggio

PDF

Open Access 3 Repos

TL;DR

This paper reveals a surprisingly simple linear relationship between training and test losses in deep networks when accounting for certain loss components, improving understanding of generalization despite identical training errors.

Contribution

It demonstrates how different generalization performances can arise from the same architecture and training error, due to intrinsic properties of the cross-entropy loss and a new loss decomposition.

Findings

01

A linear relationship between training and test loss emerges after factoring out certain loss components.

02

Classical generalization bounds are surprisingly tight under this transformed loss.

03

The empirical relation between classification error and normalized cross-entropy loss is approximately monotonic.

Abstract

Given two networks with the same training loss on a dataset, when would they have drastically different test losses and errors? Better understanding of this question of generalization may improve practical applications of deep networks. In this paper we show that with cross-entropy loss it is surprisingly simple to induce significantly different generalization performances for two networks that have the same architecture, the same meta parameters and the same training error: one can either pretrain the networks with different levels of "corrupted" data or simply initialize the networks with weights of different Gaussian standard deviations. A corollary of recent theoretical results on overfitting shows that these effects are due to an intrinsic problem of measuring test performance with a cross-entropy/exponential-type loss, which can be decomposed into two components both minimized by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Machine Learning in Materials Science

MethodsStochastic Gradient Descent