TL;DR
Mela introduces a test-time memory consolidation approach using a hierarchical memory module inspired by neuroscience, enhancing language models' ability to handle longer contexts and improve performance.
Contribution
The paper proposes the Hierarchical Memory Module (HMM) and integrates it into Transformers to create Mela, enabling online memory consolidation at test time with multi-granularity representations.
Findings
Mela outperforms Transformer baselines across all model sizes.
Mela maintains performance on longer contexts beyond training length.
Ablation studies validate each component's effectiveness.
Abstract
Memory consolidation, the process by which transient experiences are transformed into stable, structured representations, is a foundational organizing principle in the human brain, yet it remains largely unexplored as a design principle for modern sequence models. In this work, we leverage established neuroscientific theories of memory consolidation and cross-frequency coupling to propose the Hierarchical Memory Module (HMM), a neural memory architecture composed of two functionally distinct sub-modules that operate at different update frequencies. Inspired by the transformation hypothesis, the low-frequency sub-module produces high-level representations that capture abstract, gist-level knowledge, while the high-frequency sub-module produces fine-grained representations that preserve richer episodic detail. The final memory output is dynamically reconstructed as a context-dependent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
