Learning to Forget: Sleep-Inspired Memory Consolidation for Resolving Proactive Interference in Large Language Models
Ying Xie

TL;DR
This paper introduces SleepGate, a biologically inspired framework that enhances large language models by periodically consolidating and selectively forgetting outdated cache entries, significantly reducing proactive interference and improving retrieval accuracy.
Contribution
SleepGate is the first to incorporate sleep-inspired memory consolidation mechanisms into transformer-based LLMs, enabling effective management of cache interference during inference.
Findings
Achieves 99.5% retrieval accuracy at PI depth 5
Reduces interference horizon from O(n) to O(log n)
Outperforms baseline methods significantly in experiments
Abstract
Large language models (LLMs) suffer from proactive interference (PI): outdated information in the context window disrupts retrieval of current values. This interference degrades retrieval accuracy log-linearly as stale associations accumulate, a bottleneck that persists regardless of context length and resists prompt-engineering mitigations. Biological brains resolve an analogous challenge through sleep-dependent memory consolidation: synaptic downscaling, selective replay, and targeted forgetting. We propose SleepGate, a biologically inspired framework that augments transformer-based LLMs with a learned sleep cycle over the key-value (KV) cache. SleepGate introduces three mechanisms: (1) a conflict-aware temporal tagger detecting when new entries supersede old ones; (2) a lightweight forgetting gate trained to selectively evict or compress stale cache entries; and (3) a consolidation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSleep and Wakefulness Research · EEG and Brain-Computer Interfaces · Multimodal Machine Learning Applications
