Multilevel Training for Kolmogorov Arnold Networks
Ben S. Southworth, Jonas A. Actor, Graham Harper, Eric C. Cyr

TL;DR
This paper introduces a multilevel training method for Kolmogorov-Arnold networks (KANs) that leverages their structure to significantly accelerate training, especially for physics-informed neural networks.
Contribution
It establishes an equivalence between KANs with spline bases and multichannel MLPs, and develops a multilevel training algorithm using hierarchical spline refinement.
Findings
Achieves orders of magnitude improvement in accuracy over traditional training methods.
Demonstrates effectiveness on physics-informed neural networks.
Provides theoretical insights into the geometry of gradient optimization for KANs.
Abstract
Algorithmic speedup of training common neural architectures is made difficult by the lack of structure guaranteed by the function compositions inherent to such networks. In contrast to multilayer perceptrons (MLPs), Kolmogorov-Arnold networks (KANs) provide more structure by expanding learned activations in a specified basis. This paper exploits this structure to develop practical algorithms and theoretical insights, yielding training speedup via multilevel training for KANs. To do so, we first establish an equivalence between KANs with spline basis functions and multichannel MLPs with power ReLU activations through a linear change of basis. We then analyze how this change of basis affects the geometry of gradient-based optimization with respect to spline knots. The KANs change-of-basis motivates a multilevel training approach, where we train a sequence of KANs naturally defined through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Machine Learning in Materials Science · Stochastic Gradient Optimization Techniques
