LaMemo: Language Modeling with Look-Ahead Memory
Haozhe Ji, Rongsheng Zhang, Zhenyu Yang, Zhipeng Hu, Minlie Huang

TL;DR
LaMemo introduces a bi-directional look-ahead memory mechanism for language models, enabling dynamic interaction with current context and improving long-term dependency modeling in long texts.
Contribution
The paper proposes LaMemo, a novel memory mechanism that combines look-ahead attention with recurrence, allowing dynamic context interaction and better long-term dependency modeling.
Findings
Outperforms existing memory-augmented models on language benchmarks.
Efficiently incorporates bi-directional attention with linear overhead.
Enhances long-term dependency modeling in long text language modeling.
Abstract
Although Transformers with fully connected self-attentions are powerful to model long-term dependencies, they are struggling to scale to long texts with thousands of words in language modeling. One of the solutions is to equip the model with a recurrence memory. However, existing approaches directly reuse hidden states from the previous segment that encodes contexts in a uni-directional way. As a result, this prohibits the memory to dynamically interact with the current context that provides up-to-date information for token prediction. To remedy this issue, we propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens, and interpolating with the old memory states to maintain long-term information in the history. LaMemo embraces bi-directional attention and segment recurrence with an additional computation overhead only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
