Fully Kolmogorov-Arnold Deep Model in Medical Image Segmentation
Xingyu Qiu, Xinghua Ma, Dong Liang, Gongning Luo, Wei Wang, Kuanquan Wang, Shuo Li

TL;DR
This paper introduces the first fully Kolmogorov-Arnold based deep model for medical image segmentation, overcoming training and memory challenges of deep KANs, and demonstrating superior accuracy and efficiency.
Contribution
It presents a novel fully KA-based deep model, including new layers and techniques that replace traditional architectures, enabling deeper KAN exploration with reduced memory and improved performance.
Findings
Achieves higher segmentation accuracy than traditional models.
Reduces parameter count by 10 times compared to deep KANs.
Cuts memory usage by over 20 times, enabling deeper architectures.
Abstract
Deeply stacked KANs are practically impossible due to high training difficulties and substantial memory requirements. Consequently, existing studies can only incorporate few KAN layers, hindering the comprehensive exploration of KANs. This study overcomes these limitations and introduces the first fully KA-based deep model, demonstrating that KA-based layers can entirely replace traditional architectures in deep learning and achieve superior learning capacity. Specifically, (1) the proposed Share-activation KAN (SaKAN) reformulates Sprecher's variant of Kolmogorov-Arnold representation theorem, which achieves better optimization due to its simplified parameterization and denser training samples, to ease training difficulty, (2) this paper indicates that spline gradients contribute negligibly to training while consuming huge GPU memory, thus proposes the Grad-Free Spline to significantly…
Peer Reviews
Decision·Submitted to ICLR 2026
This paper solved several issues that KAN faced. In order to address this problem. Two approach was proposed, [1] SaKAN reformulates Sprecher’s variant of the KA theorem into a computationally efficient deep-learning form. [2] Grad-Free Spline offers a memory-efficient strategy supported by theoretical analysis. It shows a 10x reduction in parameter count and reduces memory consumption by more than 20x.
Limited Novelty: The overall architectural design appears similar to U-KAN, with the primary difference being the introduction of the proposed SaKAN module. While SaKAN and the Grad-Free Spline contribute to training stability and memory efficiency, the paper would benefit from a clearer discussion of how these innovations fundamentally advance beyond prior U-KAN architectures. Need for 3D Experiments: Since one of the key claims is improved parameter efficiency and training stability, it would
1. Clear and well-motivated problem statement - The paper correctly identifies the key bottlenecks preventing deep stacking of KANs: training instability and GPU memory explosion and tackles them directly. 2. Practical engineering contributions - Introduces two concrete, implementable techniques (Shared-activation KAN (SaKAN) and Grad-Free Spline) that make deep KAN architectures trainable on commodity GPUs without excessive resource cost. The proposed methods are simple to adopt and can be gene
1. Novelty and Attribution - SaKAN’s shared-activation design is a straightforward adaptation of Sprecher’s 1965 theorem; the paper does not sufficiently contrast it with KAN 2.0, LoKi, or GKAN, which already explore shared or parameter-efficient KA forms. Also, Grad-Free Spline is effectively “stop-gradient on spline basis”, which many practitioners might already do as a memory optimization. The “Theorem 1” proof is heuristic. 2. Empirical scope - Evaluations are limited to 2D biomedical datase
• Novel fully KA-based deep architecture that replaces all FC and convolutional layers with KA-based layers, surpassing prior partial or shallow KAN implementations. • Significant parameter reduction via SaKAN maintains performance with approximately 57% fewer parameters compared to vanilla KAN. • Substantial memory efficiency achieved through Grad-Free Spline.
• Scope-claim mismatch: The paper frames contributions as universally applicable to deep learning, yet experiments are limited exclusively to medical image segmentation, leading to overgeneralization in conclusions. • FLOPs reporting inconsistency: KAonv configurations show approximately 70× discrepancy with approximately 25M versus 1752M without methodological explanation or derivation. • Limited baseline coverage and tuning fairness: No comparisons with recent KAN variants such as FastKAN a
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Big Data and Digital Economy
