Human-inspired Episodic Memory for Infinite Context LLMs
Zafeirios Fountas, Martin A Benfeghoul, Adnan Oomerjee, Fenia Christopoulou, Gerasimos Lampouras, Haitham Bou-Ammar, Jun Wang

TL;DR
This paper introduces EM-LLM, a human-inspired episodic memory system for large language models that enables handling of infinite contexts efficiently, improving coherence, accuracy, and retrieval over long sequences without fine-tuning.
Contribution
EM-LLM integrates human-like episodic memory mechanisms into LLMs, allowing for infinite context processing and efficient retrieval without additional training.
Findings
Outperforms state-of-the-art retrieval models like InfLLM and RAG across benchmarks.
Enables retrieval over 10 million tokens, surpassing full-context models in many tasks.
Shows strong correlation between event segmentation and human perception, indicating biological plausibility.
Abstract
Large language models (LLMs) have shown remarkable capabilities, but still struggle with processing extensive contexts, limiting their ability to maintain coherence and accuracy over long sequences. In contrast, the human brain excels at organising and retrieving episodic experiences across vast temporal scales, spanning a lifetime. In this work, we introduce EM-LLM, a novel approach that integrates key aspects of human episodic memory and event cognition into LLMs with no fine-tuning, enabling them to handle practically infinite context lengths while maintaining computational efficiency. EM-LLM organises sequences of tokens into coherent episodic events using a combination of Bayesian surprise and graph-theoretic boundary refinement in an online fashion. When needed, these events are retrieved through a two-stage memory process, combining similarity-based and temporally contiguous…
Peer Reviews
Decision·ICLR 2025 Poster
Memory is a key function in biological brains that helps organize experiences and use them to guide future behaviours. The lack of memory function in the LLM is a major aspect that needs to be addressed. The authors have proposed a framework inspired by the biological memory system for LLMs, which is desirable and may stimulate further investigations in this direction. Overall, the paper was clearly written, showing comprehensive experimental results.
"Human-like Episodic Memory" is an overstatement. First, episodic memory combines multimodal information into a coherent recollection of experiences, with key aspects including when, where, what, how and with who a event happened. To store a sequence of tokens does not necessarily reflect such a memory. Second, parsing of events in human memory depends heavily on the content, e.g., change of location or context. The Bayesian surprise measure may be a useful proxy to approximate such separation,
Originality: EM-LLM’s incorporation of episodic memory features, particularly surprise-based segmentation, similarity-based, and contiguity-based retrieval, is a novel approach within the domain of LLMs. Using LLM surprise to model event cognition and to approximate episodic memory is original for cognitive modeling. This work not only extends the long-context capability of LLMs but also bridges machine learning and cognitive science, presenting a unique computational framework for studying memo
- In Table 1, the performances of S, SM, and SM+C methods are difficult to directly compare since different base LLMs are used in each row. I know the comparison is (partially) in Fig. 4-7 but they are not mentioned in the main text. It would be helpful if explicitly mentioned/referenced in the main text. - Although InfLLM is a primary benchmark comparison, it is absent from Figure 1. Including InfLLM in Figure 1 would provide a more complete comparison. - It’s unclear from the current paper h
The paper proposes the idea of equipping LLMs with cognitive science principles (episodic memory and event cognition), which aligns well with current challenges in long-context processing; The use of Bayesian surprise for dynamic event segmentation in LLMs is novel, moving beyond simple fixed-length segmentation used in prior work like InfLLM; The method achieves 100% accuracy on Passkey.Retrieval task with sequences up to 5M tokens, demonstrating practical scalability; The paper provides
1.The paper claims surprise-based segmentation is superior to fixed-length approaches (like InfLLM), but lacks theoretical justification for why this should be effective. While Figure 4 shows some empirical correlation with human segmentation, there's no analysis of why this leads to better LLM performance. The authors should provide a theoretical analysis showing why this metric captures semantically meaningful boundaries better than alternatives. 2.The boundary refinement process (Algorithm
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Semantic Web and Ontologies
MethodsAttention Is All You Need · Adam · Attention Dropout · Dropout · Weight Decay · Dense Connections · Byte Pair Encoding · BART · Layer Normalization · Residual Connection
