KAE: Kolmogorov-Arnold Auto-Encoder for Representation Learning
Fangchen Yu, Ruilizhen Hu, Yidong Lin, Yuqi Ma, Zhenghao Huang, Wenye, Li

TL;DR
KAE introduces a novel auto-encoder architecture using Kolmogorov-Arnold Networks to improve data representation, reconstruction, and task performance by leveraging learnable polynomial activation functions.
Contribution
This paper presents the first integration of KAN with autoencoders, enhancing representation learning with flexible polynomial activations for various data tasks.
Findings
Improves latent representation quality
Reduces reconstruction errors
Achieves superior task performance
Abstract
The Kolmogorov-Arnold Network (KAN) has recently gained attention as an alternative to traditional multi-layer perceptrons (MLPs), offering improved accuracy and interpretability by employing learnable activation functions on edges. In this paper, we introduce the Kolmogorov-Arnold Auto-Encoder (KAE), which integrates KAN with autoencoders (AEs) to enhance representation learning for retrieval, classification, and denoising tasks. Leveraging the flexible polynomial functions in KAN layers, KAE captures complex data patterns and non-linear relationships. Experiments on benchmark datasets demonstrate that KAE improves latent representation quality, reduces reconstruction errors, and achieves superior performance in downstream tasks such as retrieval, classification, and denoising, compared to standard autoencoders and other KAN variants. These results suggest KAE's potential as a useful…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
- Well-written and clearly communicated - Potentially interesting alternative to standard autoencoders - Thorough ablation on the activation functions Overall a good paper and interesting results and probably good development but the experiments really lack the evidence for the bold claims made in the abstract and introduction. The paper should be revised on that side and at the same time put evidence forward for made claims, for example, repeatedly stating the superior learning of general func
- Claims like demonstrating "superiority of the Kolmogorov-Arnold Auto-Encoder (KAE) through extensive experimental validation" are not supported by the empirical evidence because the experiments are conducted exclusively with shallow auto-encoder baselines that are quite unrealistic and not used in practice. - Too strong claims on the power of KANs themselves like stating that MLPs "still rely on fixed activation functions, which limits their flexibility in representing complex functions". This
- Clarity: This paper is clearly written with well-organized structure. The presentation is concise. The theorem in the paper demonstrated that using KAN can achieve universal function approximation. The structure of experiment section is also good, with KAE showing good performance in similarity search, image classification, image denoising - Significance: KAN as a replacement of MLP has attracted attentions this year. It is useful for other researchers to understand this new technique more.
- Originality: One of the major issue of this paper is the lack of originality. The KAE model is simply replacing MLP by KAN in auto-encoder with a simple change of kernel from B-spline to polynomial. The auto-encoder network is widely used and simple to implement while KAN is novel, it is not the contribution of this paper. It is expected that to publish in ICLR we need to either propose a novel idea that would challenge the existing belief or we should present the new system that beat the
1. The proposed method outperforms baseline models with minimal architectures on a selection of simple datasets. 2. The paper is overall easy to read and understand.
1. The experiments in the paper are insufficient to substantiate its claims (e.g., "These findings position KAE as a ‘practical’ tool for high-dimensional data analysis" and "KAE effectively captures complex data patterns"). The architecture used is too minimal to be considered empirically significant, especially given the simplicity of the datasets used. Although the authors provide experiments on four datasets, quality is more important than quantity. 2. Using learned activation functions in
**Scientific quality:** The experiments are simple and extensive. I like how in the similarity experiment the representations are directly compared via nearest neighbours instead of using fine-tuning so their direct relevance is measured. It would be interesting to see if the gap holds when fine tuning but this is a very minor point. Moreover, a decent comparisons suite between KAN variants is held. Although this isn’t mentioned in the paper, it looks like the majority of them outperform the a
**Scientific quality:** It’s unclear to what extent are the improvements due to model selection, especially as the kind of activation function used was especially selected for these tasks. Moreover, how this selection was done isn’t clear - was it based on the validation accuracy? The paper doesn’t elaborate and as it currently is seems to indicate that it was based on the test accuracy as that’s the only metric which is given. It’s unclear if the MLP and KAE have identical computational resou
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSoftmax · Attention Is All You Need · + ( 1 ) ⟷ 805 ⟷ ( 330 ) ⟷ 4056|How do I file a complaint with Expedia?
