SHARe-KAN: Post-Training Vector Quantization for Cache-Resident KAN Inference
Jeff Smith

TL;DR
SHARe-KAN is a post-training compression method for KAN models that significantly reduces storage requirements with minimal accuracy loss, enabling efficient edge deployment without retraining.
Contribution
It introduces SHARe-KAN, a novel post-training vector quantization technique for compressing KAN models, paired with a runtime for on-chip execution, achieving high compression ratios.
Findings
9.3X storage compression on PASCAL VOC detection with minimal accuracy drop
Retains 88.9% of Dense KAN mAP when transferred to COCO without retraining
13.9X reduction in storage at scale for multi-expert KAN deployment
Abstract
Pre-trained Vision Kolmogorov-Arnold Networks (KANs) store a dense B-spline grid on every edge, inflating prediction-head parameter counts by more than 140X relative to a comparable MLP and pushing inference into a memory-bound regime on edge accelerators. Standard magnitude pruning fails on these pre-trained models: zero-shot sparsity collapses accuracy, and restoring it requires an iterative fine-tuning loop that is impractical in deployment settings. We present SHARe-KAN, a post-training compiler that compresses spline coefficients via a Gain-Shape-Bias decomposition with a layer-shared codebook, paired with LUTHAM, an ExecuTorch runtime that maps the codebook into on-chip L2. On PASCAL VOC detection with a ResNet-50 backbone, SHARe-KAN Int8 reaches 9.3X storage compression over the Dense KAN baseline (6.32 MB vs. 58.67 MB prediction head) at a 2.0 point in-domain accuracy cost…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
