Test-time RL alignment exposes task familiarity artifacts in LLM benchmarks
Kun Wang, Reinhard Heckel

TL;DR
This paper introduces a test-time RL alignment method that mitigates task familiarity artifacts in LLM benchmarks, providing a more accurate assessment of models' true capabilities without requiring task-specific training data.
Contribution
The paper proposes a novel two-stage test-time RL alignment approach that aligns models to benchmarks without needing task-specific training data, revealing more genuine performance metrics.
Findings
Test-time RL alignment matches SFT-based methods in effectiveness.
Alignment reduces performance gaps between fine-tuned and base models.
Reveals that many reported gains are due to task familiarity artifacts.
Abstract
Direct evaluation of LLMs on benchmarks can be misleading because comparatively strong performance may reflect task familiarity rather than capability. The train-before-test approach controls for task familiarity by giving each model task-relevant training before evaluation, originally through supervised finetuning. However, suitable training data is often hard to come by, and evaluation results vary with the data chosen. In this paper, we propose a two-stage test-time reinforcement learning (RL) alignment method for train-before-test. First, RL with a single sample provides a first alignment of the model to the task format, and second, test-time RL with majority-voting reward aligns the model to the benchmark distribution. Our test-time RL alignment method aligns similarly well as SFT-based train-before test, but without requiring a task-specific training set. On a domain-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
