TL;DR
This paper investigates when neural networks outperform kernel methods, highlighting the role of data structure and dimensionality, and introduces a unified model to explain empirical observations in classification tasks.
Contribution
It characterizes the conditions under which neural networks outperform kernel methods, especially with low-dimensional data structures, and introduces the spiked covariates model for unified analysis.
Findings
Neural networks can overcome the curse of dimensionality with low-dimensional data structures.
RKHS methods perform poorly when covariates are nearly isotropic in high dimensions.
Perturbations in training distribution affect RKHS methods more than neural networks.
Abstract
For a certain scaling of the initialization of stochastic gradient descent (SGD), wide neural networks (NN) have been shown to be well approximated by reproducing kernel Hilbert space (RKHS) methods. Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance. On the other hand, two-layers NNs are known to encode richer smoothness classes than RKHS and we know of special examples for which SGD-trained NN provably outperform RKHS. This is true even in the wide network limit, for a different scaling of the initialization. How can we reconcile the above claims? For which tasks do NNs outperform RKHS? If covariates are nearly isotropic, RKHS methods suffer from the curse of dimensionality, while NNs can overcome it by learning the best low-dimensional representation. Here we show that this curse of dimensionality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
