Continuous Self-Improvement of Large Language Models by Test-time Training with Verifier-Driven Sample Selection

Mohammad Mahdi Moradi; Hossam Amer; Sudhir Mudur; Weiwei Zhang; Yang Liu; Walid Ahmed

arXiv:2505.19475·cs.CL·May 29, 2025

Continuous Self-Improvement of Large Language Models by Test-time Training with Verifier-Driven Sample Selection

Mohammad Mahdi Moradi, Hossam Amer, Sudhir Mudur, Weiwei Zhang, Yang Liu, Walid Ahmed

PDF

Open Access

TL;DR

This paper introduces VDS-TTT, a verifier-driven test-time training framework that enables large language models to adapt continuously to new data by selectively fine-tuning with high-confidence pseudo-labeled responses, improving performance significantly.

Contribution

It presents the first verifier-driven test-time training method that synthesizes training data for continuous self-improvement of large language models.

Findings

01

Up to 32.29% relative performance improvement over base models.

02

Achieves a 6.66% gain over verifier-only methods without test-time training.

03

Effective across diverse benchmarks and multiple large language models.

Abstract

Learning to adapt pretrained language models to unlabeled, out-of-distribution data is a critical challenge, as models often falter on structurally novel reasoning tasks even while excelling within their training distribution. We introduce a new framework called VDS-TTT - Verifier-Driven Sample Selection for Test-Time Training to efficiently address this. We use a learned verifier to score a pool of generated responses and select only from high ranking pseudo-labeled examples for fine-tuned adaptation. Specifically, for each input query our LLM generates N candidate answers; the verifier assigns a reliability score to each, and the response with the highest confidence and above a fixed threshold is paired with its query for test-time training. We fine-tune only low-rank LoRA adapter parameters, ensuring adaptation efficiency and fast convergence. Our proposed self-supervised framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsAdapter · Balanced Selection