EpMAN: Episodic Memory AttentioN for Generalizing to Longer Contexts
Subhajit Chaudhury, Payel Das, Sarathkrishna Swaminathan, Georgios, Kollias, Elliot Nelson, Khushbu Pahwa, Tejaswini Pedapati, Igor Melnyk,, Matthew Riemer

TL;DR
EpMAN introduces an episodic memory attention mechanism that enables large language models to efficiently process and recall long contexts, improving performance on long-context tasks.
Contribution
The paper presents EpMAN, a novel episodic memory module that enhances LLMs' ability to attend to and utilize long-range context effectively.
Findings
Improved performance on long-context recall benchmarks
More robust across context lengths from 16k to 256k tokens
Outperforms baseline and retrieval-augmented models
Abstract
Recent advances in Large Language Models (LLMs) have yielded impressive successes on many language tasks. However, efficient processing of long contexts using LLMs remains a significant challenge. We introduce \textbf{EpMAN} -- a method for processing long contexts in an \textit{episodic memory} module while \textit{holistically attending to} semantically relevant context chunks. The output of \textit{episodic attention} is then used to reweigh the decoder's self-attention to the stored KV cache of the context during training and generation. When an LLM decoder is trained using \textbf{EpMAN}, its performance on multiple challenging single-hop long-context recall and question-answering benchmarks is found to be stronger and more robust across the range from 16k to 256k tokens than baseline decoders trained with self-attention, and popular retrieval-augmented generation frameworks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning in Healthcare · Topic Modeling · Human Pose and Action Recognition
