A generalized neural tangent kernel for surrogate gradient learning
Luke Eilers, Raoul-Martin Memmesheimer, Sven Goedeke

TL;DR
This paper introduces a generalized neural tangent kernel, called the surrogate gradient NTK, enabling theoretical analysis of surrogate gradient learning in neural networks with non-differentiable activation functions, supported by numerical experiments.
Contribution
It extends the neural tangent kernel framework to surrogate gradient learning, providing a rigorous theoretical foundation for analyzing networks with non-differentiable activations.
Findings
Surrogate gradient NTK accurately characterizes SGL behavior.
Naive NTK extension fails for activation functions with jumps.
Numerical experiments validate the surrogate gradient NTK's effectiveness.
Abstract
State-of-the-art neural network training methods depend on the gradient of the network function. Therefore, they cannot be applied to networks whose activation functions do not have useful derivatives, such as binary and discrete-time spiking neural networks. To overcome this problem, the activation function's derivative is commonly substituted with a surrogate derivative, giving rise to surrogate gradient learning (SGL). This method works well in practice but lacks theoretical foundation. The neural tangent kernel (NTK) has proven successful in the analysis of gradient descent. Here, we provide a generalization of the NTK, which we call the surrogate gradient NTK, that enables the analysis of SGL. First, we study a naive extension of the NTK to activation functions with jumps, demonstrating that gradient descent for such activation functions is also ill-posed in the infinite-width…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications
MethodsNeural Tangent Kernel
