Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates

Yibo Li; Zijie Lin; Ailin Deng; Xuan Zhang; Yufei He; Shuo Ji; Tri Cao; Bryan Hooi

arXiv:2601.18510·cs.LG·January 27, 2026

Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates

Yibo Li, Zijie Lin, Ailin Deng, Xuan Zhang, Yufei He, Shuo Ji, Tri Cao, Bryan Hooi

PDF

Open Access

TL;DR

JitRL introduces a training-free, test-time reinforcement learning framework for LLMs that dynamically adapts without gradient updates, outperforming fine-tuning methods and reducing costs significantly.

Contribution

It proposes JitRL, a novel approach enabling continual learning in LLMs through on-the-fly policy optimization without gradient updates, backed by theoretical guarantees.

Findings

01

JitRL achieves state-of-the-art results among training-free methods.

02

JitRL outperforms expensive fine-tuning methods like WebRL.

03

JitRL reduces costs by over 30 times compared to traditional fine-tuning.

Abstract

While Large Language Model (LLM) agents excel at general tasks, they inherently struggle with continual adaptation due to the frozen weights after deployment. Conventional reinforcement learning (RL) offers a solution but incurs prohibitive computational costs and the risk of catastrophic forgetting. We introduce Just-In-Time Reinforcement Learning (JitRL), a training-free framework that enables test-time policy optimization without any gradient updates. JitRL maintains a dynamic, non-parametric memory of experiences and retrieves relevant trajectories to estimate action advantages on-the-fly. These estimates are then used to directly modulate the LLM's output logits. We theoretically prove that this additive update rule is the exact closed-form solution to the KL-constrained policy optimization objective. Extensive experiments on WebArena and Jericho demonstrate that JitRL establishes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications