An Empirical Study of Example Forgetting during Deep Neural Network Learning
Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam, Trischler, Yoshua Bengio, Geoffrey J. Gordon

TL;DR
This paper investigates the phenomenon of example forgetting during neural network training on single tasks, revealing that some examples are repeatedly forgotten, others are never forgotten, and that training data can be reduced without loss of performance.
Contribution
It introduces the concept of forgetting events in neural networks, analyzes their dynamics across datasets and architectures, and shows data reduction is possible without sacrificing accuracy.
Findings
Certain examples are forgotten frequently, others not at all.
Forgettable examples generalize across architectures.
Training data can be reduced while maintaining performance.
Abstract
Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single classification tasks. Our goal is to understand whether a related phenomenon occurs when data does not undergo a clear distributional shift. We define a `forgetting event' to have occurred when an individual training example transitions from being classified correctly to incorrectly over the course of learning. Across several benchmark data sets, we find that: (i) certain examples are forgotten with high frequency, and some not at all; (ii) a data set's (un)forgettable examples generalize across neural architectures; and (iii) based on forgetting dynamics, a significant fraction of examples can be omitted from the training data set while still maintaining state-of-the-art generalization performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
