Noradrenergic-inspired gain modulation attenuates the stability gap in joint training
Alejandro Rodriguez-Garcia, Anindya Ghosh, Srikanth Ramaswamy

TL;DR
This paper introduces a neuromodulatory-inspired gain modulation method that reduces the stability gap in joint training for continual learning, enhancing performance stability across task transitions.
Contribution
It proposes a novel dynamic gain scaling mechanism inspired by noradrenergic bursts, acting as a two-timescale optimization to improve stability without sacrificing accuracy.
Findings
Attenuates stability gaps in joint training across multiple benchmarks
Maintains competitive accuracy while improving robustness at task transitions
Effective in domain- and class-incremental learning scenarios
Abstract
Recent work in continual learning has highlighted the stability gap -- a temporary performance drop on previously learned tasks when new ones are introduced. This phenomenon reflects a mismatch between rapid adaptation and strong retention at task boundaries, underscoring the need for optimization mechanisms that balance plasticity and stability over abrupt distribution changes. While optimizers such as momentum-SGD and Adam introduce implicit multi-timescale behavior, they still exhibit pronounced stability gaps. Importantly, these gaps persist even under ideal joint training, making it crucial to study them in this setting to isolate their causes from other sources of forgetting. Motivated by how noradrenergic (neuromodulatory) bursts transiently increase neuronal gain under uncertainty, we introduce a dynamic gain scaling mechanism as a two-timescale optimization technique that…
Peer Reviews
Decision·Submitted to ICLR 2026
- The idea of using gain modulation as a flexible way to handle distribution shifts, and showing it can help with the stability gap is good, and it is empirically validated in the supervised continual learning experiments that are presented. - Neuronal gain as a proxy for task complexity is interesting. The results make sense as the neuronal gain is essentially moving average of the entropy of the outputs.
- Overall, while the method does result in an optimizer that mitigates the stability gap, it does seem to do that at the expense of overall performance. - The proposed method has significantly more hyperparameter configurations evaluated compared to the baselines (15x more). This could very easily be the reason for any performance gains of NGM-SGD. - I am not sure leaning so heavily into the biological framing is useful/correct. One of the contributions is listed as “We link our algorithmic gai
- Empirical evidences shows that NGM-SGD reduces test loss at task boundaries. - Empirical evidence shows that NGM-SGD reduces the stability gap.
- See the first bullet point in the questions, why is it necessary to compare NGM-SGD only to other optimizers: SGD, Adam, MSGD? Could there not exist some continual learning method that outperforms MSGD in the metrics illustrated in Table 1? Given this lack of a comparison to existing continual learning methods, why do the results support the efficacy of NGM-SGD? - Overall, the empirical results are mixed, see Table 1. For instance, the baseline optimizers attain comparable if not often better
**Solid theoretical foundation** The mathematical framework connecting gain modulation to fast-slow weight decomposition is clear and intuitive. The analysis showing how gain boosts flatten the loss landscape provides mechanistic insight. **Clear, implementable algorithm.** The application of NGM-SGD seems simple, with standard SGD weight update plus a gain update driven by prediction entropy each iteration. The lack of architectural changes, replay buffers, or extra losses makes it practical,
**Novelty with respect to biological grounding.** The authors claim that no prior work has adopted a bio-inspired approach to mitigate the stability gap or connected it back to adaptive biological learning. However, the complementary learning systems (CLS) literature has long modeled fast/slow learning via mimicking the hippocampus–neocortex interactions of the brain, and many continual learning methods explicitly borrow this paradigm through dual-memory architectures, replay-based consolidation
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · EEG and Brain-Computer Interfaces
