Great Memory, Shallow Reasoning: Limits of $k$NN-LMs
Shangyi Geng, Wenting Zhao, Alexander M Rush

TL;DR
This paper evaluates $k$NN-LMs across various NLP tasks, revealing they excel at memory-based tasks but have fundamental limitations in reasoning, even with perfect retrieval, highlighting their upper bounds in reasoning capabilities.
Contribution
The study provides a comprehensive assessment of $k$NN-LMs' reasoning abilities, demonstrating their limitations despite strong memory recall, and introduces upper bounds on their reasoning performance.
Findings
$k$NN-LMs perform well on memory-intensive tasks.
They struggle with multi-hop and reasoning tasks.
Even with perfect retrieval, they cannot reliably derive correct answers.
Abstract
-nearest neighbor language models (NN-LMs), which integrate retrieval with next-word prediction, have demonstrated strong performance in language modeling as well as downstream NLP benchmarks. These results have led researchers to argue that models trained on poor quality or outdated data could perform well by employing a NN extension that has access to a higher-quality datastore. In this work, we ask whether this improved ability to recall information really translates into downstream abilities. We extensively evaluate NN-LMs on a diverse set of tasks, ranging from sentiment classification and commonsense reasoning to multi-hop reasoning. Results show that NN-LMs excel at memory-intensive tasks, where utilizing the patterns in the input is sufficient for determining the output, but struggle with reasoning tasks that require integrating multiple pieces of information to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSemantic Web and Ontologies
MethodsSparse Evolutionary Training
