CG-TTRL: Context-Guided Test-Time Reinforcement Learning for On-Device Large Language Models
Peyman Hosseini, Ondrej Bohdal, Taha Ceritli, Ignacio Castro, Matthew Purver, Mete Ozay, Umberto Michieli

TL;DR
This paper introduces CG-TTRL, an enhanced test-time reinforcement learning method that dynamically incorporates context guidance to improve on-device large language model performance and efficiency in complex question-answering tasks.
Contribution
It proposes a novel context-guided approach for test-time reinforcement learning, improving pseudo-label accuracy and exploration regulation for on-device large language models.
Findings
CG-TTRL outperforms TTRL with 7% relative accuracy gain.
Achieves strong performance after only a few test-time training steps.
Boosts efficiency by significantly reducing training steps needed for high accuracy.
Abstract
Test-time Reinforcement Learning (TTRL) has shown promise in adapting foundation models for complex tasks at test-time, resulting in large performance improvements. TTRL leverages an elegant two-phase sampling strategy: first, multi-sampling derives a pseudo-label via majority voting, while subsequent downsampling and reward-based fine-tuning encourages the model to explore and learn diverse valid solutions, with the pseudo-label modulating the reward signal. Meanwhile, in-context learning has been widely explored at inference time and demonstrated the ability to enhance model performance without weight updates. However, TTRL's two-phase sampling strategy under-utilizes contextual guidance, which can potentially improve pseudo-label accuracy in the initial exploitation phase while regulating exploration in the second. To address this, we propose context-guided TTRL (CG-TTRL),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · EEG and Brain-Computer Interfaces
