What do larger image classifiers memorise?
Michal Lukasik, Vaishnavh Nagarajan, Ankit Singh Rawat, Aditya Krishna, Menon, Sanjiv Kumar

TL;DR
This paper investigates how larger image classifiers memorize training data, revealing diverse memorization patterns across model sizes and showing that knowledge distillation reduces memorization and enhances generalization.
Contribution
It provides the first comprehensive empirical analysis of how model size affects memorization in image classification, challenging existing proxies for memorization measurement.
Findings
Most samples show decreased memorization with larger models.
Proxies for memorization scores fail to capture key trends.
Knowledge distillation inhibits memorization and improves generalization.
Abstract
The success of modern neural networks has prompted study of the connection between memorisation and generalisation: overparameterised models generalise well, despite being able to perfectly fit (memorise) completely random labels. To carefully study this issue, Feldman proposed a metric to quantify the degree of memorisation of individual training examples, and empirically computed the corresponding memorisation profile of a ResNet on image classification bench-marks. While an exciting first glimpse into what real-world models memorise, this leaves open a fundamental question: do larger neural models memorise more? We present a comprehensive empirical analysis of this question on image classification benchmarks. We find that training examples exhibit an unexpectedly diverse set of memorisation trajectories across model sizes: most samples experience decreased memorisation under larger…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Data Classification · Digital Imaging for Blood Diseases
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · 1x1 Convolution · Average Pooling · Bottleneck Residual Block · Convolution · Residual Connection · Max Pooling · Global Average Pooling · Residual Block
