Exploring Kolmogorov-Arnold Network Expansions in Vision Transformers for Mitigating Catastrophic Forgetting in Continual Learning
Zahid Ullah, Jihie Kim

TL;DR
This paper introduces Kolmogorov-Arnold Networks into Vision Transformers to reduce catastrophic forgetting in continual learning, showing improved knowledge retention and adaptability on benchmark datasets.
Contribution
It presents a novel integration of KANs into ViTs, leveraging spline-based activations for local plasticity to mitigate forgetting in continual learning scenarios.
Findings
KAN-based ViTs outperform traditional MLP-based ViTs in retaining knowledge
Significant reduction in catastrophic forgetting observed on MNIST and CIFAR100
Enhanced ability to learn new tasks without losing previous knowledge
Abstract
Continual learning (CL), the ability of a model to learn new tasks without forgetting previously acquired knowledge, remains a critical challenge in artificial intelligence, particularly for vision transformers (ViTs) utilizing Multilayer Perceptrons (MLPs) for global representation learning. Catastrophic forgetting, where new information overwrites prior knowledge, is especially problematic in these models. This research proposes replacing MLPs in ViTs with Kolmogorov-Arnold Network (KANs) to address this issue. KANs leverage local plasticity through spline-based activations, ensuring that only a subset of parameters is updated per sample, thereby preserving previously learned knowledge. The study investigates the efficacy of KAN-based ViTs in CL scenarios across benchmark datasets (MNIST, CIFAR100), focusing on their ability to retain accuracy on earlier tasks while adapting to new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
