The Curious Case of Benign Memorization
Sotiris Anagnostidis, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

TL;DR
This paper investigates how deep neural networks can memorize random data in a benign way through data augmentation, enabling meaningful feature learning and separating noise from signal across layers.
Contribution
It reveals that data augmentation facilitates benign memorization by distributing learning across layers, and provides initial explanations for this phenomenon.
Findings
Neural networks can memorize random labels benignly with data augmentation.
Different layers encode features and memorization separately, with last layers handling noise.
Augmentation diversity influences the memorization-generalization trade-off.
Abstract
Despite the empirical advances of deep learning across a variety of learning tasks, our theoretical understanding of its success is still very restricted. One of the key challenges is the overparametrized nature of modern models, enabling complete overfitting of the data even if the labels are randomized, i.e. networks can completely \textit{memorize} all given patterns. While such a memorization capacity seems worrisome, in this work we show that under training protocols that include \textit{data augmentation}, neural networks learn to memorize entirely random labels in a benign way, i.e. they learn embeddings that lead to highly non-trivial performance under nearest neighbour probing. We demonstrate that deep models have the surprising ability to separate noise from signal by distributing the task of memorization and feature learning to different layers. As a result, only the very…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMusic and Audio Processing · Neural Networks and Applications · Human Pose and Action Recognition
