Beyond ReLU: How Activations Affect Neural Kernels and Random Wide Networks
David Holzm\"uller, Max Sch\"olpple

TL;DR
This paper characterizes the RKHS of neural tangent and Gaussian process kernels for various common activation functions, extending theoretical understanding beyond ReLU and analyzing the effects of activation smoothness.
Contribution
It provides a comprehensive analysis of the RKHS for activation functions with non-smoothness at zero, including SELU, ELU, LeakyReLU, and polynomial activations, across different network depths.
Findings
Broad class of activations produce equivalent RKHSs at different depths.
Polynomial activations' RKHS depends on network depth.
Characterization of the smoothness of infinitely wide neural networks at initialization.
Abstract
In recent years, the neural tangent kernel (NTK) and neural network Gaussian process kernel (NNGP) have given theoreticians tractable limiting cases of fully connected neural networks. However, the property of these kernels are poorly understood for activation functions other than powers of the ReLU. Our main contribution is a characterization of the RKHS of these kernels for activation functions whose only non-smoothness is at zero. This extends existing theory to numerous commonly used activation functions such as SELU, ELU, or LeakyReLU. Additionally, we analyze a broad set of special cases such as missing biases, two-layer networks, or polynomial activations. Our results show that a broad class of not infinitely smooth activations generate equivalent RKHSs at different network depths, depending only on the degree of the non-smoothness up to equivalence. On the other hand, the RKHS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
