Nonparametric Masked Language Modeling
Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh, Hajishirzi, Luke Zettlemoyer

TL;DR
NPM introduces a nonparametric masked language model that retrieves tokens from a corpus, improving predictions for rare words and phrases, and outperforms larger parametric models in zero-shot tasks.
Contribution
This paper presents the first nonparametric masked language model that replaces softmax with corpus-based retrieval, enabling better handling of rare tokens and phrases.
Findings
Outperforms larger parametric models in zero-shot tasks
Excels at predicting rare words and patterns
Efficient training with contrastive learning and in-batch retrieval
Abstract
Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases. We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a reference corpus. NPM fills in the [MASK] solely from retrieving a token from a text corpus. We show that NPM can be efficiently trained with a contrastive objective and an in-batch approximation to full corpus retrieval. Zero-shot evaluation on 16 tasks including classification, fact probing and question answering demonstrates that NPM outperforms significantly larger parametric models, either with or without a retrieve-and-generate approach. It is particularly better at dealing with rare patterns (word senses or facts) and predicting rare or nearly unseen words (e.g., non-Latin script). We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsSoftmax
