Kolmogorov-Arnold Fourier Networks
Jusheng Zhang, Yijia Fan, Kaitong Cai, Keze Wang

TL;DR
The paper introduces the Kolmogorov-Arnold-Fourier Network (KAF), a novel model that combines Fourier features and a hybrid activation to improve high-dimensional spectral approximation with efficiency and interpretability.
Contribution
It proposes a new KAF model that reduces parameters, enhances spectral representation, and maintains interpretability through innovative integration of Fourier features and adaptive activation functions.
Findings
KAF outperforms existing models in vision, NLP, and audio tasks.
It achieves better spectral approximation with fewer parameters.
KAF demonstrates strong theoretical and practical advantages.
Abstract
Although Kolmogorov-Arnold based interpretable networks (KAN) have strong theoretical expressiveness, they face significant parameter explosion and high-frequency feature capture challenges in high-dimensional tasks. To address this issue, we propose the Kolmogorov-Arnold-Fourier Network (KAF), which effectively integrates trainable Random Fourier Features (RFF) and a novel hybrid GELU-Fourier activation mechanism to balance parameter efficiency and spectral representation capabilities. Our key technical contributions include: (1) merging KAN's dual-matrix structure through matrix association properties to substantially reduce parameters; (2) introducing learnable RFF initialization strategies to eliminate spectral distortion in high-dimensional approximation tasks; (3) implementing an adaptive hybrid activation function that progressively enhances frequency representation during the…
Peer Reviews
Decision·ICLR 2026 Conference Desk Rejected Submission
KANs typically grow in parameter count as $O(d_{in} d_{out} (G + K + 3))$, which scales poorly. The proposed hybrid GELU–Fourier activation (with learnable coefficients) adds a smooth transition mechanism from standard activation-based networks to Fourier-driven symbolic representations. This activation concept is novel and could be applied beyond KANs.
The paper argues that Random Fourier Features mitigate KAN’s limited ability to represent high-frequency components. However, RFFs derived from the Gaussian kernel correspond to elements of a very smooth RKHS, whose spectral density decays rapidly. It remains unclear how this construction enhances high-frequency expressiveness. The authors should clarify whether the frequency distribution in their trainable RFFs deviates from the Gaussian kernel case, and if so, how this affects the underlying f
1. By integrating Fourier transforms and trainable RFF, KAF significantly reduces the number of parameters while preserving expressiveness, as proven in Appendix B.1. 2. The Experiments span diverse domains: vision accuracies up to >91% on CIFAR10 (Table 1); lower perplexity on NLP tasks (Table 2); superior RMSE on function approximation (Fig. 5) and PDEs like Poisson/Heat (Fig. 6, Table 5 for noise robustness). These experiments show a significant improvement in KAN. 3. The mathematical expre
1. The experimental results on CIFAR-10 and ImageNet1K should be improved. I read the paper in references, the accuracy of ResNet-18 on CIFAR-10 is 93.02% [1], the top-1 accuracy of ViT-Tiny on ImageNet-1K is 79.1% [2], and the perplexity of GPT-2 on wikitext-103 is 19.89 [3]. However, in Table 1 and Table 2, KAF shows better accuracy than MLP when performing as a feature mixer. I guess such a conflict is caused by insufficient hyperparameter tuning (e.g., learning rate schedule, batch size, etc
1. Clear Motivation and Novel Angle The paper correctly identifies KAN’s bottlenecks — parameter inefficiency and poor high-frequency capture — and links them to B-spline smoothness and dense parameterization. The proposed RFF-based architecture is a natural and elegant evolution: Fourier expansions can model oscillatory components efficiently while maintaining GPU-friendly structure. 2. Hybrid GELU–Fourier Design 3. Solid Experimental Breadth The experimental section is comprehensive: Image
1. Innovation is not sufficient; more like engineering tricks 2. lack fo lit rev on kans; for example the kan 2.0 paper (Kan 2.0: Kolmogorov-arnold networks meet science), also the paper on KAN's spectral bias (On the expressiveness and spectral bias of KANs) among other advances on kans.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum Mechanics and Applications · Advanced Thermodynamics and Statistical Mechanics · Statistical Mechanics and Entropy
MethodsKernel Activation Function
