Gradient Starvation: A Learning Proclivity in Neural Networks

Mohammad Pezeshki; S\'ekou-Oumar Kaba; Yoshua Bengio; Aaron Courville,; Doina Precup; Guillaume Lajoie

arXiv:2011.09468·cs.LG·November 25, 2021·43 cites

Gradient Starvation: A Learning Proclivity in Neural Networks

Mohammad Pezeshki, S\'ekou-Oumar Kaba, Yoshua Bengio, Aaron Courville,, Doina Precup, Guillaume Lajoie

PDF

Open Access 2 Repos 1 Models 1 Video

TL;DR

This paper uncovers a fundamental phenomenon called Gradient Starvation in neural networks, explaining how gradient descent can lead to incomplete feature learning and proposing a regularization method to mitigate this issue.

Contribution

It provides a theoretical framework for understanding Gradient Starvation and introduces a novel regularization technique to improve feature diversity and model robustness.

Findings

01

Gradient Starvation causes neural networks to focus on a subset of features.

02

Theoretical analysis links feature imbalance to data structure and learning dynamics.

03

Regularization improves accuracy and robustness in out-of-distribution scenarios.

Abstract

We identify and formalize a fundamental gradient descent phenomenon resulting in a learning proclivity in over-parameterized neural networks. Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task, despite the presence of other predictive features that fail to be discovered. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks. Using tools from Dynamical Systems theory, we identify simple properties of learning dynamics during gradient descent that lead to this imbalance, and prove that such a situation can be expected given certain statistical structure in training data. Based on our proposed formalism, we develop guarantees for a novel regularization method aimed at decoupling feature learning dynamics, improving accuracy and robustness in cases hindered by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
coastalcph/danish-legal-longformer-eurlex-sd
model· 16 dl· ♡ 3
16 dl♡ 3

Videos

Gradient Starvation: A Learning Proclivity in Neural Networks· slideslive

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Model Reduction and Neural Networks · Domain Adaptation and Few-Shot Learning