How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks
Keyulu Xu, Mozhi Zhang, Jingling Li, Simon S. Du, Ken-ichi, Kawarabayashi, Stefanie Jegelka

TL;DR
This paper investigates how neural networks, including feedforward and graph neural networks, extrapolate beyond their training data, providing theoretical conditions for successful extrapolation and highlighting the importance of architecture and feature encoding.
Contribution
It offers a theoretical framework connecting neural tangent kernels to extrapolation capabilities and explains when GNNs succeed in extrapolating complex tasks.
Findings
ReLU MLPs quickly become linear, limiting nonlinear extrapolation.
Diverse training data enables MLPs to learn linear functions effectively.
GNNs' success in extrapolation depends on encoding task-specific nonlinearities.
Abstract
We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution. Previous works report mixed empirical results when extrapolating with neural networks: while feedforward neural networks, a.k.a. multilayer perceptrons (MLPs), do not extrapolate well in certain simple tasks, Graph Neural Networks (GNNs) -- structured networks with MLP modules -- have shown some success in more complex tasks. Working towards a theoretical explanation, we identify conditions under which MLPs and GNNs extrapolate well. First, we quantify the observation that ReLU MLPs quickly converge to linear functions along any direction from the origin, which implies that ReLU MLPs do not extrapolate most nonlinear functions. But, they can provably learn a linear target function when the training distribution is sufficiently "diverse". Second,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
Methods*Communicated@Fast*How Do I Communicate to Expedia?
