Why do Nearest Neighbor Language Models Work?
Frank F. Xu, Uri Alon, Graham Neubig

TL;DR
This paper investigates why retrieval-augmented k-nearest neighbor language models outperform standard neural LMs, identifying key factors like input representation, approximate search, and softmax temperature, and proposes improvements to standard models.
Contribution
The study provides a detailed analysis of the reasons behind kNN-LMs' superior performance and introduces methods to enhance standard LMs based on these insights.
Findings
kNN-LMs outperform standard LMs due to input representation differences
Approximate kNN search impacts model performance
Softmax temperature plays a crucial role in kNN distribution effectiveness
Abstract
Language models (LMs) compute the probability of a text by sequentially computing a representation of an already-seen context and using this representation to predict the next word. Currently, most LMs calculate these representations through a neural network consuming the immediate previous context. However recently, retrieval-augmented LMs have shown to improve over standard neural LMs, by accessing information retrieved from a large datastore, in addition to their standard, parametric, next-word prediction. In this paper, we set out to understand why retrieval-augmented language models, and specifically why k-nearest neighbor language models (kNN-LMs) perform better than standard parametric LMs, even when the k-nearest neighbor component retrieves examples from the same training set that the LM was originally trained on. To this end, we perform a careful analysis of the various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsSoftmax
