Better NTK Conditioning: A Free Lunch from (ReLU) Nonlinear Activation in Wide Neural Networks
Chaoyue Liu, Han Bi, Like Hui, Xiao Liu

TL;DR
This paper reveals that ReLU nonlinear activation functions improve feature separation and NTK conditioning in wide neural networks, especially with increased depth, leading to better convergence properties.
Contribution
It uncovers a novel property of nonlinear activations, demonstrating their role in enhancing NTK conditioning and data separation in wide neural networks.
Findings
Nonlinear activations improve feature separation in model gradient space.
Nonlinear activations lead to better NTK conditioning, reducing the condition number.
Deeper networks amplify the effects of nonlinear activations on data separation and NTK conditioning.
Abstract
Nonlinear activation functions are widely recognized for enhancing the expressivity of neural networks, which is the primary reason for their widespread implementation. In this work, we focus on ReLU activation and reveal a novel and intriguing property of nonlinear activations. By comparing enabling and disabling the nonlinear activations in the neural network, we demonstrate their specific effects on wide neural networks: (a) better feature separation, i.e., a larger angle separation for similar data in the feature space of model gradient, and (b) better NTK conditioning, i.e., a smaller condition number of neural tangent kernel (NTK). Furthermore, we show that the network depth (i.e., with more nonlinear activation operations) further amplifies these effects; in addition, in the infinite-width-then-depth limit, all data are equally separated with a fixed angle in the model gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Neural Network Applications · Machine Learning and ELM
MethodsNeural Tangent Kernel
