Enabling Robust In-Context Memory and Rapid Task Adaptation in Transformers with Hebbian and Gradient-Based Plasticity

Siddharth Chaudhary

arXiv:2510.21908·cs.NE·November 6, 2025

Enabling Robust In-Context Memory and Rapid Task Adaptation in Transformers with Hebbian and Gradient-Based Plasticity

Siddharth Chaudhary

PDF

TL;DR

This paper explores biologically inspired synaptic plasticity mechanisms, specifically Hebbian and gradient-based rules, to enable Transformers to adapt rapidly within sequences, improving performance on various tasks compared to static weights.

Contribution

It introduces and evaluates biologically inspired plasticity modules in Transformers, demonstrating their effectiveness for fast in-sequence adaptation and task generalization.

Findings

01

Hebbian plasticity reduces loss and improves few-shot generalization.

02

Gradient-based plasticity excels in long-horizon credit assignment.

03

Plasticity mechanisms are most beneficial when associations are short and linearly separable.

Abstract

Large language models display in-context learning as an emergent effect of scale, but they rely on static weights during inference. In contrast, biological systems continually adapt via synaptic plasticity. We investigate whether explicit, biologically inspired plasticity can endow Transformers with faster in-sequence adaptation. To this end, we augment decoder-only Transformers with fast-weight modules updated either by (i) a neuromodulated Hebbian rule or (ii) the gradient-based plasticity mechanism of Duan et al. (2023). Across copying, regression, and few-shot classification tasks (CIFAR-FS, Omniglot), Hebbian plasticity consistently achieves lower loss and stronger few-shot generalization, while gradient-based updates perform best on long-horizon credit assignment. When associations are short and linearly separable, static weights suffice, defining a clear boundary condition for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.