On the Convergence of (Stochastic) Gradient Descent for Kolmogorov--Arnold Networks
Yihang Gao, Vincent Y. F. Tan

TL;DR
This paper provides the first theoretical convergence guarantees for gradient descent and stochastic gradient descent when training Kolmogorov--Arnold Networks, explaining their empirical success in various scientific and machine learning tasks.
Contribution
It establishes global convergence results for GD and SGD on two-layer KANs, including physics-informed variants, using neural tangent kernel analysis.
Findings
GD achieves global linear convergence for large hidden dimensions
SGD converges in expectation for regression tasks
Convergence guarantees extend to physics-informed KANs
Abstract
Kolmogorov--Arnold Networks (KANs), a recently proposed neural network architecture, have gained significant attention in the deep learning community, due to their potential as a viable alternative to multi-layer perceptrons (MLPs) and their broad applicability to various scientific tasks. Empirical investigations demonstrate that KANs optimized via stochastic gradient descent (SGD) are capable of achieving near-zero training loss in various machine learning (e.g., regression, classification, and time series forecasting, etc.) and scientific tasks (e.g., solving partial differential equations). In this paper, we provide a theoretical explanation for the empirical success by conducting a rigorous convergence analysis of gradient descent (GD) and SGD for two-layer KANs in solving both regression and physics-informed tasks. For regression problems, we establish using the neural tangent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpectral Theory in Mathematical Physics · Graph theory and applications · Topological and Geometric Data Analysis
MethodsSoftmax · Attention Is All You Need · Stochastic Gradient Descent
