Reproducibility Report: Test-Time Training on Nearest Neighbors for Large Language Models
Boyang Zhou, Johan Lindqvist, Lindsey Li

TL;DR
This paper reproduces and validates the effectiveness of test-time training using nearest neighbors to adapt large language models, showing significant performance improvements across diverse datasets and models, with practical implementation insights.
Contribution
It confirms the benefits of nearest-neighbor test-time training for large language models and introduces memory-efficient retrieval methods for large-scale deployment.
Findings
Test-time training reduces perplexity across datasets.
Models benefit more if not pretrained on the target data.
Memory-efficient retrieval enables large-scale adaptation.
Abstract
We reproduce the central claims of Test-Time Training on Nearest Neighbors for Large Language Models (Hardt and Sun, 2024), which proposes adapting a language model at inference time by fine-tuning on retrieved nearest-neighbor sequences. Using pretrained RoBERTa embeddings indexed with Faiss, we retrieve 20 neighbors per test input and apply one gradient update per neighbor across GPT-2 (117M, 774M), GPT-Neo (1.3B), and R1-Distilled-Qwen2.5-1.5B. Our experiments confirm that test-time training significantly reduces perplexity and bits-per-byte metrics across diverse domains from The Pile, with the largest improvements in structured or specialized datasets such as GitHub and EuroParl. We further validate that models not pretrained on The Pile benefit more from this adaptation than models already trained on similar data, allowing smaller models to approach the performance of larger ones.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Materials Science · Natural Language Processing Techniques
