Learning to Discover at Test Time
Mert Yuksekgonul, Daniel Koceja, Xinhao Li, Federico Bianchi, Jed McCaleb, Xiaolong Wang, Jan Kautz, Yejin Choi, James Zou, Carlos Guestrin, Yu Sun

TL;DR
This paper introduces TTT-Discover, a reinforcement learning method that fine-tunes large language models at test time to produce state-of-the-art solutions for specific scientific problems across various domains, using open models and affordable resources.
Contribution
The paper presents a novel test-time reinforcement learning approach that adapts LLMs for individual problems, achieving superior results without relying on closed models.
Findings
Sets new state-of-the-art in multiple scientific problems
Achieves up to 2x faster solutions in GPU kernel tasks
Demonstrates effectiveness across diverse domains like mathematics and biology
Abstract
How can we use AI to discover a new state of the art for a scientific problem? Prior work in test-time scaling, such as AlphaEvolve, performs search by prompting a frozen LLM. We perform reinforcement learning at test time, so the LLM can continue to train, but now with experience specific to the test problem. This form of continual learning is quite special, because its goal is to produce one great solution rather than many good ones on average, and to solve this very problem rather than generalize to other problems. Therefore, our learning objective and search subroutine are designed to prioritize the most promising solutions. We call this method Test-Time Training to Discover (TTT-Discover). Following prior work, we focus on problems with continuous rewards. We report results for every problem we attempted, across mathematics, GPU kernel engineering, algorithm design, and biology.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- reasoning-degeneration-dev/ttt-discover-eval-gpt-5-nano-20260226-000552dataset· 19 dl19 dl
- reasoning-degeneration-dev/ttt-discover-eval-gpt-oss-120b-20260226-113556dataset· 19 dl19 dl
- reasoning-degeneration-dev/ttt-discover-eval-gpt-oss-120b-20260226-113856dataset· 16 dl16 dl
- reasoning-degeneration-dev/ttt-discover-eval-gpt-oss-120b-20260226-115037dataset· 22 dl22 dl
- reasoning-degeneration-dev/ttt-discover-eval-together_ai-openai-gpt-oss-120b-20260226-chattemplatedataset· 14 dl14 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Mobile Crowdsensing and Crowdsourcing · Stochastic Gradient Optimization Techniques
