Learning to Discover at Test Time

Mert Yuksekgonul; Daniel Koceja; Xinhao Li; Federico Bianchi; Jed McCaleb; Xiaolong Wang; Jan Kautz; Yejin Choi; James Zou; Carlos Guestrin; Yu Sun

arXiv:2601.16175·cs.LG·February 6, 2026

Learning to Discover at Test Time

Mert Yuksekgonul, Daniel Koceja, Xinhao Li, Federico Bianchi, Jed McCaleb, Xiaolong Wang, Jan Kautz, Yejin Choi, James Zou, Carlos Guestrin, Yu Sun

PDF

Open Access 1 Models 5 Datasets

TL;DR

This paper introduces TTT-Discover, a reinforcement learning method that fine-tunes large language models at test time to produce state-of-the-art solutions for specific scientific problems across various domains, using open models and affordable resources.

Contribution

The paper presents a novel test-time reinforcement learning approach that adapts LLMs for individual problems, achieving superior results without relying on closed models.

Findings

01

Sets new state-of-the-art in multiple scientific problems

02

Achieves up to 2x faster solutions in GPU kernel tasks

03

Demonstrates effectiveness across diverse domains like mathematics and biology

Abstract

How can we use AI to discover a new state of the art for a scientific problem? Prior work in test-time scaling, such as AlphaEvolve, performs search by prompting a frozen LLM. We perform reinforcement learning at test time, so the LLM can continue to train, but now with experience specific to the test problem. This form of continual learning is quite special, because its goal is to produce one great solution rather than many good ones on average, and to solve this very problem rather than generalize to other problems. Therefore, our learning objective and search subroutine are designed to prioritize the most promising solutions. We call this method Test-Time Training to Discover (TTT-Discover). Following prior work, we focus on problems with continuous rewards. We report results for every problem we attempted, across mathematics, GPU kernel engineering, algorithm design, and biology.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Jarrodbarnes/KernelBench-RLVR-120b
model· 19 dl· ♡ 2
19 dl♡ 2

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSingle-cell and spatial transcriptomics · Mobile Crowdsensing and Crowdsourcing · Stochastic Gradient Optimization Techniques