An analytic theory of generalization dynamics and transfer learning in   deep linear networks

Andrew K. Lampinen; Surya Ganguli

arXiv:1809.10374·stat.ML·January 8, 2019·38 cites

An analytic theory of generalization dynamics and transfer learning in deep linear networks

Andrew K. Lampinen, Surya Ganguli

PDF

Open Access

TL;DR

This paper develops an analytic theory for understanding how deep linear networks learn and transfer knowledge across tasks, explaining generalization behavior and revealing conditions for effective transfer learning.

Contribution

It provides the first analytic solutions for the dynamics of generalization in deep linear networks, including transfer learning, based on task structure and signal-to-noise ratios.

Findings

01

Deep networks learn important task features first.

02

Generalization error early in training depends mainly on task structure.

03

Transfer learning effectiveness depends on task SNR and feature alignment.

Abstract

Much attention has been devoted recently to the generalization puzzle in deep learning: large, deep networks can generalize well, but existing theories bounding generalization error are exceedingly loose, and thus cannot explain this striking performance. Furthermore, a major hope is that knowledge may transfer across tasks, so that multi-task learning can improve generalization on individual tasks. However we lack analytic theories that can quantitatively predict how the degree of knowledge transfer depends on the relationship between the tasks. We develop an analytic theory of the nonlinear dynamics of generalization in deep linear networks, both within and across tasks. In particular, our theory provides analytic solutions to the training and testing error of deep networks as a function of training time, number of examples, network size and initialization, and the task structure and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Neural Networks and Applications

MethodsEarly Stopping