Beyond neural scaling laws: beating power law scaling via data pruning
Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, Ari S., Morcos

TL;DR
This paper demonstrates that data pruning can surpass traditional power law scaling in neural network error reduction, leading to more efficient training and resource savings.
Contribution
It introduces a theoretical framework for breaking power law scaling with data pruning, empirically validates this on multiple datasets, and benchmarks various pruning metrics including a new scalable self-supervised method.
Findings
Pruning data can improve error scaling beyond power laws.
Most existing pruning metrics do not scale well to ImageNet.
A new simple self-supervised pruning metric performs comparably to supervised metrics.
Abstract
Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how in theory we can break beyond power law scaling and potentially even reduce it to exponential scaling instead if we have access to a high-quality data pruning metric that ranks the order in which training examples should be discarded to achieve any pruned dataset size. We then test this improved scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling in practice on ResNets trained on CIFAR-10, SVHN, and ImageNet. Next, given the importance of finding high-quality pruning metrics, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
MethodsPruning · Test
