Self-Normalized Resets for Plasticity in Continual Learning

Vivek F. Farias; Adam D. Jozefiak

arXiv:2410.20098·cs.LG·September 30, 2025

Self-Normalized Resets for Plasticity in Continual Learning

Vivek F. Farias, Adam D. Jozefiak

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Self-Normalized Resets (SNR), an adaptive method to mitigate plasticity loss in continual learning by resetting neurons based on firing rate evidence, demonstrating superior performance and robustness across tasks.

Contribution

The paper proposes SNR, a novel reset mechanism for neurons in continual learning, supported by a hypothesis test and theoretical analysis showing its effectiveness and robustness.

Findings

01

SNR outperforms competing algorithms in various continual learning tasks.

02

SNR is robust to its hyperparameter, the rejection percentile threshold.

03

Theoretical analysis shows idealized SNR can learn ReLUs even from adversarial initializations.

Abstract

Plasticity Loss is an increasingly important phenomenon that refers to the empirical observation that as a neural network is continually trained on a sequence of changing tasks, its ability to adapt to a new task diminishes over time. We introduce Self-Normalized Resets (SNR), a simple adaptive algorithm that mitigates plasticity loss by resetting a neuron's weights when evidence suggests its firing rate has effectively dropped to zero. Across a battery of continual learning problems and network architectures, we demonstrate that SNR consistently attains superior performance compared to its competitor algorithms. We also demonstrate that SNR is robust to its sole hyperparameter, its rejection percentile threshold, while competitor algorithms show significant sensitivity. SNR's threshold-based reset mechanism is motivated by a simple hypothesis test that we derive. Seen through the lens…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 5Confidence 3

Strengths

The method is simple, easy to implement. And empirically on the problems it has been applied to, it outperforms the existing baseline, which are similar reset methods that use more complex conditions to reset (basically define complex utility functions that are used to make this selection).

Weaknesses

I'm somewhat concerned about the significance of the results. I think the method outperforms the baseline, but most more detailed results tend to be on permuted MNIST or permuted Shakespeare. I know previous work relied on similar small scale problems, and it feels unfair to punish this work, while the others got away with it. But at some point I'm worried that we are reading too much into these numbers. The authors have run on other tasks (Random Label Cifar, Continual ImageNet) but we can on

Reviewer 02Rating 3Confidence 4

Strengths

The paper’s new mechanism to select which neurons to reset seems interesting. The theory and intuition behind the method need to be explained more clearly, but the results do show a slight improvement on some settings for some datasets.

Weaknesses

- This paper feels incomplete. The entire theory section in Section 4 is hard to follow and does not feel very motivated. It’s unclear why Section 4.2 is put in the paper. The paper then immediately ends after an equation, with no explanation of how it relates to the rest of the paper, and it is missing a conclusion section. - The results in Table 1 are conducted over 5 seeds, but they are missing any error bars. - Table 2 shows results on a task introduced in this paper, Permuted Shakespeare, b

Reviewer 03Rating 8Confidence 4

Strengths

__Originality:__ The idea of resetting neurons to avoid loss of plasticity is not particularly novel or surprising (as evidenced by the cited related work), but the proposed method appears novel and most importantly, satisfies multiple desiderata (simple, effective, robust) well beyond previous methods. __Quality:__ Although on relatively small tasks, the experiments are thorough and convincing. The comparisons are made on a variety of datasets and consideration is made for hyperparameter tunin

Weaknesses

This paragraph is confusing: "This reveals an interesting comparison with SNR. The schemes above will re-initialize a neuron after inactivity over a period of time that is uniform across all neurons. In the context of the hypothesis testing setup above, this will result in sub-optimal error rates across neurons. On the other hand, SNR will *reset a neuron after it is inactive for a period that is effectively normalized to the nominal firing rate of that neuron*, while still only specifying a sin

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning

Methods*Communicated@Fast*How Do I Communicate to Expedia?