Kernel Methods and Multi-layer Perceptrons Learn Linear Models in High Dimensions
Mojtaba Sahraee-Ardakan, Melikasadat Emami, Parthe Pandit, Sundeep, Rangan, Alyson K. Fletcher

TL;DR
This paper demonstrates that in high-dimensional settings, kernel methods and neural tangent kernels behave like linear models, and linear models are optimal even for nonlinear data, indicating the need for more complex models.
Contribution
It shows that kernel methods and neural tangent kernels only perform as well as linear models in high dimensions and that linear models are optimal for certain nonlinear data regimes.
Findings
Kernel methods behave like linear models in high dimensions.
Linear models are optimal for certain nonlinear data.
More complex models are needed for high-dimensional data analysis.
Abstract
Empirical observation of high dimensional phenomena, such as the double descent behaviour, has attracted a lot of interest in understanding classical techniques such as kernel methods, and their implications to explain generalization properties of neural networks. Many recent works analyze such models in a certain high-dimensional regime where the covariates are independent and the number of samples and the number of covariates grow at a fixed ratio (i.e. proportional asymptotics). In this work we show that for a large class of kernels, including the neural tangent kernel of fully connected networks, kernel methods can only perform as well as linear models in this regime. More surprisingly, when the data is generated by a kernel model where the relationship between input and the response could be very nonlinear, we show that linear models are in fact optimal, i.e. linear models achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
