Properties of the After Kernel
Philip M. Long

TL;DR
This paper investigates the properties of the 'after kernel' in neural networks, showing how training modifies the kernel to become more global and invariant, and how these changes impact classification accuracy and invariance to transformations.
Contribution
It introduces and analyzes the 'after kernel' concept, demonstrating its evolution during training and its effects on invariance and classification performance.
Findings
After kernel improves SVM accuracy after training.
After kernel becomes more invariant to transformations with training.
Larger learning rates lead to more global and invariant after kernels.
Abstract
The Neural Tangent Kernel (NTK) is the wide-network limit of a kernel defined using neural networks at initialization, whose embedding is the gradient of the output of the network with respect to its parameters. We study the "after kernel", which is defined using the same embedding, except after training, for neural networks with standard architectures, on binary classification problems extracted from MNIST and CIFAR-10, trained using SGD in a standard way. For some dataset-architecture pairs, after a few epochs of neural network training, a hard-margin SVM using the network's after kernel is much more accurate than when the network's initial kernel is used. For networks with an architecture similar to VGG, the after kernel is more "global", in the sense that it is less invariant to transformations of input images that disrupt the global structure of the image while leaving the local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Neural Networks and Applications
MethodsNeural Tangent Kernel · Max Pooling · Dense Connections · Convolution · Support Vector Machine · Softmax · Stochastic Gradient Descent · Dropout
