Why do Nearest Neighbor Language Models Work?

Frank F. Xu; Uri Alon; Graham Neubig

arXiv:2301.02828·cs.CL·January 18, 2023·6 cites

Why do Nearest Neighbor Language Models Work?

Frank F. Xu, Uri Alon, Graham Neubig

PDF

Open Access 1 Repo

TL;DR

This paper investigates why retrieval-augmented k-nearest neighbor language models outperform standard neural LMs, identifying key factors like input representation, approximate search, and softmax temperature, and proposes improvements to standard models.

Contribution

The study provides a detailed analysis of the reasons behind kNN-LMs' superior performance and introduces methods to enhance standard LMs based on these insights.

Findings

01

kNN-LMs outperform standard LMs due to input representation differences

02

Approximate kNN search impacts model performance

03

Softmax temperature plays a crucial role in kNN distribution effectiveness

Abstract

Language models (LMs) compute the probability of a text by sequentially computing a representation of an already-seen context and using this representation to predict the next word. Currently, most LMs calculate these representations through a neural network consuming the immediate previous context. However recently, retrieval-augmented LMs have shown to improve over standard neural LMs, by accessing information retrieved from a large datastore, in addition to their standard, parametric, next-word prediction. In this paper, we set out to understand why retrieval-augmented language models, and specifically why k-nearest neighbor language models (kNN-LMs) perform better than standard parametric LMs, even when the k-nearest neighbor component retrieves examples from the same training set that the LM was originally trained on. To this end, we perform a careful analysis of the various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

frankxu2004/knnlm-why
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsSoftmax