CG-TTRL: Context-Guided Test-Time Reinforcement Learning for On-Device Large Language Models

Peyman Hosseini; Ondrej Bohdal; Taha Ceritli; Ignacio Castro; Matthew Purver; Mete Ozay; Umberto Michieli

arXiv:2511.06430·cs.LG·November 11, 2025

CG-TTRL: Context-Guided Test-Time Reinforcement Learning for On-Device Large Language Models

Peyman Hosseini, Ondrej Bohdal, Taha Ceritli, Ignacio Castro, Matthew Purver, Mete Ozay, Umberto Michieli

PDF

Open Access

TL;DR

This paper introduces CG-TTRL, an enhanced test-time reinforcement learning method that dynamically incorporates context guidance to improve on-device large language model performance and efficiency in complex question-answering tasks.

Contribution

It proposes a novel context-guided approach for test-time reinforcement learning, improving pseudo-label accuracy and exploration regulation for on-device large language models.

Findings

01

CG-TTRL outperforms TTRL with 7% relative accuracy gain.

02

Achieves strong performance after only a few test-time training steps.

03

Boosts efficiency by significantly reducing training steps needed for high accuracy.

Abstract

Test-time Reinforcement Learning (TTRL) has shown promise in adapting foundation models for complex tasks at test-time, resulting in large performance improvements. TTRL leverages an elegant two-phase sampling strategy: first, multi-sampling derives a pseudo-label via majority voting, while subsequent downsampling and reward-based fine-tuning encourages the model to explore and learn diverse valid solutions, with the pseudo-label modulating the reward signal. Meanwhile, in-context learning has been widely explored at inference time and demonstrated the ability to enhance model performance without weight updates. However, TTRL's two-phase sampling strategy under-utilizes contextual guidance, which can potentially improve pseudo-label accuracy in the initial exploitation phase while regulating exploration in the second. To address this, we propose context-guided TTRL (CG-TTRL),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · EEG and Brain-Computer Interfaces