What is the Effect of Importance Weighting in Deep Learning?

Jonathon Byrd; Zachary C. Lipton

arXiv:1812.03372·cs.LG·June 17, 2019·113 cites

What is the Effect of Importance Weighting in Deep Learning?

Jonathon Byrd, Zachary C. Lipton

PDF

Open Access 1 Repo

TL;DR

This paper investigates how importance weighting influences deep neural network training, revealing that its impact diminishes over time and is affected by regularization techniques, with implications for practical machine learning applications.

Contribution

It provides the first detailed analysis of importance weighting effects in over-parameterized deep networks, showing the temporal dynamics and influence of regularization methods.

Findings

01

Importance weighting affects early training stages.

02

Regularization techniques can restore some importance weighting effects.

03

Impact of importance weighting diminishes over training epochs.

Abstract

Importance-weighted risk minimization is a key ingredient in many machine learning algorithms for causal inference, domain adaptation, class imbalance, and off-policy reinforcement learning. While the effect of importance weighting is well-characterized for low-capacity misspecified models, little is known about how it impacts over-parameterized, deep neural networks. This work is inspired by recent theoretical results showing that on (linearly) separable data, deep linear networks optimized by SGD learn weight-agnostic solutions, prompting us to ask, for realistic deep networks, for which many practical datasets are separable, what is the effect of importance weighting? We present the surprising finding that while importance weighting impacts models early in training, its effect diminishes over successive epochs. Moreover, while L2 regularization and batch normalization (but not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hajohajo/TrainingFrameworkTrackDNN
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Algorithms · Machine Learning and Data Classification

MethodsBatch Normalization · Stochastic Gradient Descent