TR-ICRL: Test-Time Rethinking for In-Context Reinforcement Learning

Wenxuan Jiang; Yuxin Zuo; Zijian Zhang; Xuecheng Wu; Zining Fan; Wenxuan Liu; Li Chen; Xiaoyu Li; Xuezhi Cao; Xiaolong Jin; Ninghao Liu

arXiv:2604.00438·cs.CL·April 2, 2026

TR-ICRL: Test-Time Rethinking for In-Context Reinforcement Learning

Wenxuan Jiang, Yuxin Zuo, Zijian Zhang, Xuecheng Wu, Zining Fan, Wenxuan Liu, Li Chen, Xiaoyu Li, Xuezhi Cao, Xiaolong Jin, Ninghao Liu

PDF

1 Repo

TL;DR

TR-ICRL introduces a test-time rethinking framework for in-context reinforcement learning, enhancing reward estimation and iterative answer refinement in large language models for reasoning tasks.

Contribution

It proposes a novel framework that retrieves relevant instances, generates candidate answers, and uses majority voting for pseudo-labels to improve LLM performance during inference.

Findings

01

TR-ICRL improves Qwen2.5-7B by 21.23% on MedQA.

02

Achieves 137.59% improvement on AIME2024.

03

Demonstrates robustness through extensive ablation studies.

Abstract

In-Context Reinforcement Learning (ICRL) enables Large Language Models (LLMs) to learn online from external rewards directly within the context window. However, a central challenge in ICRL is reward estimation, as models typically lack access to ground-truths during inference. To address this limitation, we propose Test-Time Rethinking for In-Context Reinforcement Learning (TR-ICRL), a novel ICRL framework designed for both reasoning and knowledge-intensive tasks. TR-ICRL operates by first retrieving the most relevant instances from an unlabeled evaluation set for a given query. During each ICRL iteration, LLM generates a set of candidate answers for every retrieved instance. Next, a pseudo-label is derived from this set through majority voting. This label then serves as a proxy to give reward messages and generate formative feedbacks, guiding LLM through iterative refinement. In the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pangpang-xuan/TR_ICRL
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.