Scalable Neural Network Kernels
Arijit Sehanobish, Krzysztof Choromanski, Yunfan Zhao, Avinava Dubey,, Valerii Likhosherstov

TL;DR
This paper introduces scalable neural network kernels (SNNKs) that replace traditional layers for improved efficiency and expressiveness, enabling network compression and potential bypassing of backpropagation with theoretical and empirical validation.
Contribution
The paper proposes SNNKs as a novel layer replacement that enhances expressiveness and computational efficiency, along with a bundling process for neural network compression and explicit parameter formulas.
Findings
Up to 5x reduction in trainable parameters
Competitive accuracy with compressed models
Theoretical analysis of SNNKs and URFs
Abstract
We introduce the concept of scalable neural network kernels (SNNKs), the replacements of regular feedforward layers (FFLs), capable of approximating the latter, but with favorable computational properties. SNNKs effectively disentangle the inputs from the parameters of the neural network in the FFL, only to connect them in the final computation via the dot-product kernel. They are also strictly more expressive, as allowing to model complicated relationships beyond the functions of the dot-products of parameter-input vectors. We also introduce the neural network bundling process that applies SNNKs to compactify deep neural network architectures, resulting in additional compression gains. In its extreme version, it leads to the fully bundled network whose optimal parameters can be expressed via explicit formulae for several loss functions (e.g. mean squared error), opening a possibility…
Peer Reviews
Decision·ICLR 2024 poster
The paper introduces the concept of Scalable Neural Network Kernels (SNNKs), a fresh take on neural network architecture. This novel approach to approximating regular feedforward layers (FFLs) with computational benefits showcases a high degree of originality. The "neural network bundling process" and the notion of a fully bundled network present innovative methods for condensing deep neural network architectures. The "universal random features" mechanism, which aids in the formulation of variou
The paper could benefit from a more direct comparison of SNNKs with other existing solutions or methods aimed at network compression or efficiency. Highlighting the unique advantages of SNNKs over these methods would further solidify its significance. The paper could delve deeper into the robustness of the SNNK approach. Are there scenarios where the approximation might break down? Understanding the edge cases and potential pitfalls would be crucial for practitioners looking to adopt this metho
The paper introduces a new computational model, the scalable neural network kernels (SNNK), providing a novel approach to efficient neural network design, particularly for replacing feedforward layers in MLPs. The design of SNNKs ensures that inputs and parameters are disentangled, leading to efficient final computations via a dot-product kernel, which can greatly reduce computational overhead. The bundling process highlighted in the paper leads to the compactification of the neural network st
The authors should provide some explanation or intuition why their model doesn’t work well in the some of the experiments they have performed. The analysis of how deep of a feed forward network can be approximated using the proposed method should be analyzed in further details. Can scalable neural network kernel be applied in any scenario or there are some specific scenarios when SKNN won’t work well. Authors should discuss about such datasets/models. If there is none, then authors should als
Here are some of the main strengths of this paper: - It makes an insightful connection between scalable kernel methods and neural network layers, introducing a novel perspective on feedforward layers. - The concept of SNNKs is very clearly presented along with detailed theoretical analysis and constructions. - The Fourier transform based universal random feature mechanism to instantiate SNNKs is interesting and useful. - SNNKs provably increase expressive power over standard layers, as shown
Some potential weaknesses or limitations of this paper: - The focus is on feedforward fully-connected layers, not convolutional or recurrent layers commonly used in modern networks. - Experiments are limited to standard datasets and models; more complex domains like bioinformatics are not evaluated. - There is no investigation into how SNNKs affect representation learning or generalization. The emphasis is on compression. - Optimization and learning dynamics with SNNKs are not analyzed, apa
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Geophysical Methods and Applications · Machine Learning and ELM
MethodsAdapter
