Optimized Architectures for Kolmogorov-Arnold Networks
James Bagrow, Josh Bongard

TL;DR
This paper presents a method to create compact, interpretable Kolmogorov--Arnold networks by combining overprovisioned architectures with sparsification, depth selection, and deep supervision, optimized end-to-end.
Contribution
It introduces a differentiable, principled approach using minimum description length to jointly optimize structure, activations, and depth for KANs.
Findings
Depth selection with sparsification improves accuracy.
The method discovers smaller, more interpretable models.
Experiments show competitive or superior performance.
Abstract
Efforts to improve Kolmogorov--Arnold networks (KANs) with architectural enhancements have been stymied by the complexity those enhancements bring, undermining the interpretability that makes KANs attractive in the first place. Here we study overprovisioned architectures combined with sparsification, deep supervision, and depth selection, to learn compact, interpretable KANs without sacrificing accuracy. Crucially, we focus on differentiable mechanisms under a principled minimum description length objective, jointly optimizing activations, structure, and depth end-to-end. Experiments across function approximation benchmarks, dynamical systems forecasting, and real-world prediction tasks demonstrate that sparsification alone is insufficient, but the combination with depth selection achieves competitive or superior accuracy while discovering substantially smaller models. The result is a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
