On the Convergence of (Stochastic) Gradient Descent for   Kolmogorov--Arnold Networks

Yihang Gao; Vincent Y. F. Tan

arXiv:2410.08041·cs.LG·October 11, 2024

On the Convergence of (Stochastic) Gradient Descent for Kolmogorov--Arnold Networks

Yihang Gao, Vincent Y. F. Tan

PDF

Open Access

TL;DR

This paper provides the first theoretical convergence guarantees for gradient descent and stochastic gradient descent when training Kolmogorov--Arnold Networks, explaining their empirical success in various scientific and machine learning tasks.

Contribution

It establishes global convergence results for GD and SGD on two-layer KANs, including physics-informed variants, using neural tangent kernel analysis.

Findings

01

GD achieves global linear convergence for large hidden dimensions

02

SGD converges in expectation for regression tasks

03

Convergence guarantees extend to physics-informed KANs

Abstract

Kolmogorov--Arnold Networks (KANs), a recently proposed neural network architecture, have gained significant attention in the deep learning community, due to their potential as a viable alternative to multi-layer perceptrons (MLPs) and their broad applicability to various scientific tasks. Empirical investigations demonstrate that KANs optimized via stochastic gradient descent (SGD) are capable of achieving near-zero training loss in various machine learning (e.g., regression, classification, and time series forecasting, etc.) and scientific tasks (e.g., solving partial differential equations). In this paper, we provide a theoretical explanation for the empirical success by conducting a rigorous convergence analysis of gradient descent (GD) and SGD for two-layer KANs in solving both regression and physics-informed tasks. For regression problems, we establish using the neural tangent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpectral Theory in Mathematical Physics · Graph theory and applications · Topological and Geometric Data Analysis

MethodsSoftmax · Attention Is All You Need · Stochastic Gradient Descent