MetaCluster: Enabling Deep Compression of Kolmogorov-Arnold Network

Matthew Raffel; Adwaith Renjith; Lizhong Chen

arXiv:2510.19105·cs.LG·February 10, 2026

MetaCluster: Enabling Deep Compression of Kolmogorov-Arnold Network

Matthew Raffel, Adwaith Renjith, Lizhong Chen

PDF

Open Access 3 Reviews

TL;DR

MetaCluster is a novel framework that significantly compresses Kolmogorov-Arnold Networks by clustering basis coefficient vectors, reducing storage by up to 80 times without accuracy loss.

Contribution

It introduces a meta-learner-guided clustering approach to make KANs highly compressible while maintaining their expressivity and accuracy.

Findings

01

Achieves up to 80x parameter reduction on image datasets.

02

Maintains accuracy despite aggressive compression.

03

Effective on high-dimensional equation modeling tasks.

Abstract

Kolmogorov-Arnold Networks (KANs) replace scalar weights with per-edge vectors of basis coefficients, thereby increasing expressivity and accuracy while also resulting in a multiplicative increase in parameters and memory. We propose MetaCluster, a framework that makes KANs highly compressible without sacrificing accuracy. Specifically, a lightweight meta-learner, trained jointly with the KAN, maps low-dimensional embeddings to coefficient vectors, thereby shaping them to lie on a low-dimensional manifold that is amenable to clustering. We then run K-means in coefficient space and replace per-edge vectors with shared centroids. Afterwards, the meta-learner can be discarded, and a brief fine-tuning of the centroid codebook recovers any residual accuracy loss. The resulting model stores only a small codebook and per-edge indices, exploiting the vector nature of KAN parameters to amortize…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

The proposed weight sharing method greatly reduces the amount of trainable parameters in KAN.

Weaknesses

The proposed method crucially relies on k-means clustering to provide reasonable good centroids. However, k-means clustering assumes the data is spherically shaped, which may not be true in practice. Could the authors replace the K-means clustering by other clustering methods (e.g. gaussian mixture model) to illustrate the proposed method can be used together with different clustering algorithms?

Reviewer 02Rating 4Confidence 4

Strengths

- The paper clearly identifies KAN’s memory inefficiency and proposes a targeted solution. - This is the first successful application of weight sharing specifically designed for KANs. - The paper is well organized and easy to follow.

Weaknesses

- Complete absence of vector quantization literature. The proposed method is fundamentally vector quantization (VQ): mapping high-dimensional vectors to discrete codebook entries via clustering. However, the paper never mentions "vector quantization" and ignores relevant research. - Lack of comparison with established vector quantization methods. The paper employs standard K-means with Euclidean distance but provides no comparison against advanced VQ techniques. For example, Product Quantization

Reviewer 03Rating 6Confidence 2

Strengths

The paper is well written, and its motivation is clear. The main strength lies in the impressive experimental results, as demonstrated in Tables 1 and 2. Additionally, the authors provide thorough ablation studies to validate their design choices, as shown in Section 4.3.

Weaknesses

- The authors do not benchmark against non-KAN compression baselines. Given the extensive literature on model compression, it would be valuable to compare MetaCluster with common techniques (e.g., pruning, quantization, or weight sharing) applied to MLPs or CNNs. This would help clarify whether MetaCluster is state-of-the-art relative to general compression methods. If those methods are not easily extendable to KANs, a discussion explaining why would strengthen the paper. - The evaluation is co

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques · Machine Learning in Materials Science