Surprisal-Guided Selection: Compute-Optimal Test-Time Strategies for Execution-Grounded Code Generation
Jarrod Barnes

TL;DR
This paper demonstrates that surprisal-guided selection significantly improves execution-grounded code generation success rates over traditional test-time training, emphasizing the importance of sample diversity and intelligent selection over gradient adaptation.
Contribution
The main contribution is introducing surprisal-guided selection as a zero-cost, effective strategy for improving task success in execution-grounded code generation, outperforming gradient-based adaptation.
Findings
Search-based sampling outperforms minimal gradient adaptation.
Surprisal-guided selection improves success rate to 80%.
Over-sharpening causes gradient updates to reduce diversity.
Abstract
Test-time training (TTT) adapts language models through gradient-based updates at inference. But is adaptation the right strategy? We study compute-optimal test-time strategies for verifiable execution-grounded (VEG) tasks, domains like GPU kernel optimization where a deterministic evaluator provides dense, continuous reward signals. Using KernelBench as our testbed and a 120B-parameter model (GPT-OSS-120B with LoRA adaptation), we find that search outperforms minimal adaptation (1-5 gradient steps): Best-of-N sampling achieves 90% task success (18/20 tasks) at K=64 across the full KernelBench L1 eval set while TTT's best checkpoint reaches only 30.6% (3-seed mean), with TTT's "equivalent K" falling below 1, worse than single-sample inference. The failure mode is over-sharpening: gradient updates collapse diversity toward mediocre solutions rather than discovering optimal ones. Our main…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques
