CATFormer: When Continual Learning Meets Spiking Transformers With Dynamic Thresholds

Vaishnavi Nagabhushana; Kartikay Agrawal; Ayon Borthakur

arXiv:2603.15184·cs.LG·March 17, 2026

CATFormer: When Continual Learning Meets Spiking Transformers With Dynamic Thresholds

Vaishnavi Nagabhushana, Kartikay Agrawal, Ayon Borthakur

PDF

Open Access 4 Reviews

TL;DR

CATFormer introduces a novel spiking neural network framework with dynamic thresholds and gating mechanisms, effectively mitigating catastrophic forgetting in continual learning scenarios across diverse datasets.

Contribution

It proposes the DTLIF neuron model with context-adaptive thresholds and a G-DHS inference mechanism, advancing continual learning in SNNs without rehearsal methods.

Findings

01

Outperforms existing rehearsal-free CIL algorithms on multiple datasets.

02

Effective in both static and neuromorphic datasets.

03

Enhances energy efficiency and true-class incremental learning.

Abstract

Although deep neural networks perform extremely well in controlled environments, they fail in real-world scenarios where data isn't available all at once, and the model must adapt to a new data distribution that may or may not follow the initial distribution. Previously acquired knowledge is lost during subsequent updates based on new data. a phenomenon commonly known as catastrophic forgetting. In contrast, the brain can learn without such catastrophic forgetting, irrespective of the number of tasks it encounters. Existing spiking neural networks (SNNs) for class-incremental learning (CIL) suffer a sharp performance drop as tasks accumulate. We here introduce CATFormer (Context Adaptive Threshold Transformer), a scalable framework that overcomes this limitation. We observe that the key to preventing forgetting in SNNs lies not only in synaptic plasticity but also in modulating neuronal…

Peer Reviews

Decision·ICLR 2026 Conference Desk Rejected Submission

Reviewer 01Rating 6Confidence 3

Strengths

The proposed task-specific dynamic threshold method for continual learning is an interesting and novel contribution as far I as I'm aware. The authors support this with strong class-incremental learning performance. No data replay is required. Task labels are also not required at inference as a task-prediction step takes care of it. The knowledge transfer ability or 'reverse forgetting' is impressive to see.

Weaknesses

All tasks are split-class/CIL style tasks. I would be interested to see some that alternatives, such as a task with a new input distribution each time but the same class labels (permuted MNIST). some presentation issues, I think axis labels should be added.

Reviewer 02Rating 2Confidence 4

Strengths

1. The topic—combining continual learning with spiking transformers—is interesting and timely. 2. The experimental scope covers both conventional and neuromorphic datasets, with relatively clear implementation details.

Weaknesses

1. The core contribution, dynamic thresholds, has been extensively explored in prior work. The paper does not convincingly show what is new here beyond applying adaptive thresholds to a transformer backbone. The use of a frozen encoder with learnable thresholds is a straightforward modification of known SNN paradigms. 2. Although the paper repeatedly claims “task-specific thresholds,” it is not clear how thresholds are conditioned on tasks during inference. There is no mechanism for automatic ta

Reviewer 03Rating 4Confidence 3

Strengths

The paper brings class-incremental learning to spiking vision transformers by freezing the backbone after the base task and using per-task learnable thresholds in DT-LIF units with a lightweight routing head. This combination is new in the SNN literature and targets a gap called out in surveys. The empirical scope covers both static and neuromorphic datasets with long task sequences, which is rare for SNN-CL. The core idea is easy to follow.

Weaknesses

1. The method is task-conditional at inference due to a learned gate, but the paper’s framing and comparisons read as if it were task-agnostic single-head CIL. This should be stated clearly and evaluated against matched rehearsal-free transformer baselines from the ANN literature that also use routing or adapters; without these, it is hard to attribute gains to threshold learning rather than to routing. 2. The “reverse forgetting” result lacks controls that separate router calibration from withi

Reviewer 04Rating 4Confidence 4

Strengths

1. This paper is well-written and easy to follow. 2. The proposed task-specific (context adaptive) dynamic neuronal thresholds seem to be an interesting design adaptive to the SNN-based transformer, simple but effective. 3. The proposed method achieves strong performance lead over traditional continual learning baselines on relatively simple datasets.

Weaknesses

1. The compared methods are mainly very traditional continual learning methods (EWC, SI, iCaRL, DER, etc.). Is it possible to include more recent methods? 2. The experiments are mainly performed with CIFAR-100 and Tiny-ImageNet of small image scales. Does the proposed method apply to larger-scale images, such as 224*224 images of ImageNet (subsets)? 3. I’m not sure how the gated dynamic head selection works. Since the tasks are continually introduced, does this mechanisms also suffer catastrop

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Neural Networks and Reservoir Computing · Domain Adaptation and Few-Shot Learning