Fundamental tradeoffs between memorization and robustness in random features and neural tangent regimes
Elvis Dohmatob

TL;DR
This paper explores the fundamental trade-offs between memorization and robustness in two-layer neural networks within high-dimensional linearized regimes, revealing tight bounds and phenomena like multiple descent in robustness.
Contribution
It establishes tight lower bounds on the Sobolev-seminorm for neural networks in various regimes, linking memorization to robustness and validating findings with empirical experiments.
Findings
Lower bounds on Sobolev-seminorm depend on network width and data dimensions.
Tight bounds are achieved by min-norm / least-squares interpolators.
Discovered a multiple-descent phenomenon in robustness of the interpolator.
Abstract
This work studies the (non)robustness of two-layer neural networks in various high-dimensional linearized regimes. We establish fundamental trade-offs between memorization and robustness, as measured by the Sobolev-seminorm of the model w.r.t the data distribution, i.e the square root of the average squared -norm of the gradients of the model w.r.t the its input. More precisely, if is the number of training examples, is the input dimension, and is the number of hidden neurons in a two-layer neural network, we prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded by (i) in case of infinite-width random features (RF) or neural tangent kernel (NTK) with ; (ii) in case of finite-width RF with proportionate scaling of and ; and (iii)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Adversarial Robustness in Machine Learning
MethodsNeural Tangent Kernel
