Exploring Kolmogorov-Arnold Network Expansions in Vision Transformers for Mitigating Catastrophic Forgetting in Continual Learning

Zahid Ullah; Jihie Kim

arXiv:2507.04020·cs.CV·July 8, 2025

Exploring Kolmogorov-Arnold Network Expansions in Vision Transformers for Mitigating Catastrophic Forgetting in Continual Learning

Zahid Ullah, Jihie Kim

PDF

TL;DR

This paper introduces Kolmogorov-Arnold Networks into Vision Transformers to reduce catastrophic forgetting in continual learning, showing improved knowledge retention and adaptability on benchmark datasets.

Contribution

It presents a novel integration of KANs into ViTs, leveraging spline-based activations for local plasticity to mitigate forgetting in continual learning scenarios.

Findings

01

KAN-based ViTs outperform traditional MLP-based ViTs in retaining knowledge

02

Significant reduction in catastrophic forgetting observed on MNIST and CIFAR100

03

Enhanced ability to learn new tasks without losing previous knowledge

Abstract

Continual learning (CL), the ability of a model to learn new tasks without forgetting previously acquired knowledge, remains a critical challenge in artificial intelligence, particularly for vision transformers (ViTs) utilizing Multilayer Perceptrons (MLPs) for global representation learning. Catastrophic forgetting, where new information overwrites prior knowledge, is especially problematic in these models. This research proposes replacing MLPs in ViTs with Kolmogorov-Arnold Network (KANs) to address this issue. KANs leverage local plasticity through spline-based activations, ensuring that only a subset of parameters is updated per sample, thereby preserving previously learned knowledge. The study investigates the efficacy of KAN-based ViTs in CL scenarios across benchmark datasets (MNIST, CIFAR100), focusing on their ability to retain accuracy on earlier tasks while adapting to new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.