KANtize: Exploring Low-bit Quantization of Kolmogorov-Arnold Networks for Efficient Inference
Sohaib Errabii, Olivier Sentieys, Marcello Traiola

TL;DR
This paper explores low-bit quantization of Kolmogorov-Arnold Networks (KANs), demonstrating significant computational and hardware efficiency gains with minimal accuracy loss, especially when using 2-3 bit quantization.
Contribution
It introduces low-bit quantization techniques for KANs, enabling efficient inference with negligible accuracy loss and substantial hardware resource savings.
Findings
Quantizing B-splines to 2-3 bits maintains accuracy.
50x reduction in BitOps with low-bit quantized B-spline tables.
Hardware implementations show significant resource and speed improvements.
Abstract
Kolmogorov-Arnold Networks (KANs) have gained attention for their potential to outperform Multi-Layer Perceptrons (MLPs) in terms of parameter efficiency and interpretability. Unlike traditional MLPs, KANs use learnable non-linear activation functions, typically spline functions, expressed as linear combinations of basis splines (B-splines). B-spline coefficients serve as the model's learnable parameters. However, evaluating these spline functions increases computational complexity during inference. Conventional quantization reduces this complexity by lowering the numerical precision of parameters and activations. However, the impact of quantization on KANs, and especially its effectiveness in reducing computational complexity, is largely unexplored, particularly for quantization levels below 8 bits. The study investigates the impact of low-bit quantization on KANs and its impact on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms · Ferroelectric and Negative Capacitance Devices · Parallel Computing and Optimization Techniques
