Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates
Yibo Li, Zijie Lin, Ailin Deng, Xuan Zhang, Yufei He, Shuo Ji, Tri Cao, Bryan Hooi

TL;DR
JitRL introduces a training-free, test-time reinforcement learning framework for LLMs that dynamically adapts without gradient updates, outperforming fine-tuning methods and reducing costs significantly.
Contribution
It proposes JitRL, a novel approach enabling continual learning in LLMs through on-the-fly policy optimization without gradient updates, backed by theoretical guarantees.
Findings
JitRL achieves state-of-the-art results among training-free methods.
JitRL outperforms expensive fine-tuning methods like WebRL.
JitRL reduces costs by over 30 times compared to traditional fine-tuning.
Abstract
While Large Language Model (LLM) agents excel at general tasks, they inherently struggle with continual adaptation due to the frozen weights after deployment. Conventional reinforcement learning (RL) offers a solution but incurs prohibitive computational costs and the risk of catastrophic forgetting. We introduce Just-In-Time Reinforcement Learning (JitRL), a training-free framework that enables test-time policy optimization without any gradient updates. JitRL maintains a dynamic, non-parametric memory of experiences and retrieves relevant trajectories to estimate action advantages on-the-fly. These estimates are then used to directly modulate the LLM's output logits. We theoretically prove that this additive update rule is the exact closed-form solution to the KL-constrained policy optimization objective. Extensive experiments on WebArena and Jericho demonstrate that JitRL establishes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications
