Can neural networks extrapolate? Discussion of a theorem by Pedro Domingos
Adrien Courtois, Jean-Michel Morel, Pablo Arias

TL;DR
This paper discusses a theorem by Pedro Domingos that suggests neural networks trained with gradient descent are essentially kernel machines, limiting their extrapolation abilities, especially as task complexity increases.
Contribution
The paper extends Domingos' theorem to discrete cases and vector outputs, analyzing its implications for neural network interpolation capabilities.
Findings
Kernel interpretation explains neural network predictions in simple cases.
Network extrapolation is limited by the kernel nature as task complexity grows.
The theorem's relevance is demonstrated on shape recovery from boundary data.
Abstract
Neural networks trained on large datasets by minimizing a loss have become the state-of-the-art approach for resolving data science problems, particularly in computer vision, image processing and natural language processing. In spite of their striking results, our theoretical understanding about how neural networks operate is limited. In particular, what are the interpolation capabilities of trained neural networks? In this paper we discuss a theorem of Domingos stating that "every machine learned by continuous gradient descent is approximately a kernel machine". According to Domingos, this fact leads to conclude that all machines trained on data are mere kernel machines. We first extend Domingo's result in the discrete case and to networks with vector-valued output. We then study its relevance and significance on simple examples. We find that in simple cases, the "neural tangent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference
