Suffix Retrieval-Augmented Language Modeling
Zecheng Wang, Yik-Cheung Tam

TL;DR
This paper introduces SUREALM, a novel language model that incorporates suffix retrieval to simulate bi-directional context in autoregressive sequence generation, improving perplexity on dialogue data.
Contribution
SUREALM is the first to integrate suffix retrieval for bi-directional context simulation in causal language modeling, enhancing sequence prediction.
Findings
Reduced perplexity on DSTC9 corpus
Effective imitation of bi-directional context
Outperformed baseline models in experiments
Abstract
Causal language modeling (LM) uses word history to predict the next word. BERT, on the other hand, makes use of bi-directional word information in a sentence to predict words at masked positions. While BERT is effective in sequence encoding, it is non-causal by nature and is not designed for sequence generation. In this paper, we propose a novel language model, SUffix REtrieval-Augmented LM (SUREALM), that simulates a bi-directional contextual effect in an autoregressive manner. SUREALM employs an embedding retriever to search for training sentences in a data store that share similar word history during sequence generation. In particular, the suffix portions of the retrieved sentences mimick the "future" context. We evaluated our proposed model on the DSTC9 spoken dialogue corpus and showed promising word perplexity reduction on the validation and test set compared to competitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Test · Linear Layer · Layer Normalization · Residual Connection · Dropout · Softmax · WordPiece · Linear Warmup With Linear Decay
