TL;DR
This paper introduces TRC$^{2}$, a novel architecture for language models that incorporates cortical columns, thalamic pathways, and hippocampal mechanisms to enable effective continual learning and reduce forgetting.
Contribution
The paper presents TRC$^{2}$, a biologically inspired architecture that integrates multiple neural pathways to improve continual learning in language models without external stabilization procedures.
Findings
TRC$^{2}$ reduces cumulative forgetting across multiple language tasks.
The architecture improves task-boundary modeling quality.
Ablation studies confirm the importance of thalamic and hippocampal components.
Abstract
Large language models deployed in the wild must adapt to evolving data, user behavior, and task mixtures without erasing previously acquired capabilities. In practice, this remains difficult: sequential updates induce catastrophic forgetting, while many stabilization methods rely on external procedures that are costly, brittle, or difficult to scale. We present TRC (Thalamically Routed Cortical Columns), a decoder-only architecture that makes continual learning a property of the backbone itself. TRC combines stacked cortical columns with a thalamic modulatory pathway for selective inter-column communication and a hippocampal pathway for event selective retrieval, delayed surprise-based writing, and replay-driven consolidation. This design localizes fast plasticity while preserving a slower stable computation pathway. We further introduce a causal memory-update scheme and an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
