Optimized Architectures for Kolmogorov-Arnold Networks

James Bagrow; Josh Bongard

arXiv:2512.12448·cs.LG·April 22, 2026

Optimized Architectures for Kolmogorov-Arnold Networks

James Bagrow, Josh Bongard

PDF

TL;DR

This paper presents a method to create compact, interpretable Kolmogorov--Arnold networks by combining overprovisioned architectures with sparsification, depth selection, and deep supervision, optimized end-to-end.

Contribution

It introduces a differentiable, principled approach using minimum description length to jointly optimize structure, activations, and depth for KANs.

Findings

01

Depth selection with sparsification improves accuracy.

02

The method discovers smaller, more interpretable models.

03

Experiments show competitive or superior performance.

Abstract

Efforts to improve Kolmogorov--Arnold networks (KANs) with architectural enhancements have been stymied by the complexity those enhancements bring, undermining the interpretability that makes KANs attractive in the first place. Here we study overprovisioned architectures combined with sparsification, deep supervision, and depth selection, to learn compact, interpretable KANs without sacrificing accuracy. Crucially, we focus on differentiable mechanisms under a principled minimum description length objective, jointly optimizing activations, structure, and depth end-to-end. Experiments across function approximation benchmarks, dynamical systems forecasting, and real-world prediction tasks demonstrate that sparsification alone is insufficient, but the combination with depth selection achieves competitive or superior accuracy while discovering substantially smaller models. The result is a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.