Scalable In-Context Q-Learning

Jinmei Liu; Fuhong Liu; Zhenhong Sun; Jianye Hao; Huaxiong Li; Bo Wang; Daoyi Dong; Chunlin Chen; Zhi Wang

arXiv:2506.01299·cs.AI·February 9, 2026

Scalable In-Context Q-Learning

Jinmei Liu, Fuhong Liu, Zhenhong Sun, Jianye Hao, Huaxiong Li, Bo Wang, Daoyi Dong, Chunlin Chen, Zhi Wang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces S-ICQL, a scalable in-context Q-learning framework that combines dynamic programming and world modeling with a prompt-based transformer architecture to improve decision-making in complex environments.

Contribution

It proposes a novel, scalable in-context reinforcement learning method that integrates world modeling and dynamic programming with a transformer architecture for efficient policy learning.

Findings

01

Consistent performance improvements over baselines.

02

Effective learning from suboptimal trajectories.

03

Fast and precise in-context inference enabled by world models.

Abstract

Recent advancements in language models have demonstrated remarkable in-context learning abilities, prompting the exploration of in-context reinforcement learning (ICRL) to extend the promise to decision domains. Due to involving more complex dynamics and temporal correlations, existing ICRL approaches may face challenges in learning from suboptimal trajectories and achieving precise in-context inference. In the paper, we propose \textbf{S}calable \textbf{I}n-\textbf{C}ontext \textbf{Q}-\textbf{L}earning (\textbf{S-ICQL}), an innovative framework that harnesses dynamic programming and world modeling to steer ICRL toward efficient reward maximization and task generalization, while retaining the scalability and stability of supervised pretraining. We design a prompt-based multi-head transformer architecture that simultaneously predicts optimal policies and in-context value functions using…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

- The manuscript is well-written and easy-to-follow. - The experiments are comprehensive, including an array of strong baselines. - The performance of S-ICQL is strong.

Weaknesses

My biggest concern is about the novelty of the proposed framework S-ICQL. While it has many components, all of its components and insights have appeared in prior works. As a consequence, S-ICQL feels like a unification of ideas from prior works. Specifically, there are two core technical contributions, - **Section 3.3.** The world modeling is **identical** to [1]. Both the model design and pretraining process are almost identical to the task-embedding model of [1]. - **Section 3.4.** S-ICQL al

Reviewer 02Rating 4Confidence 3

Strengths

- Good experimental results which consider a number of relevant baselines - Ablations show that each of the components contributes meaningfully to performance - Stitching section shows improvements over best returns in the dataset

Weaknesses

- My main issue is that it is very hard to understand the details of the method. Figure 1 is very complicated; there are many arrows going in different directions and it is not obvious what exactly “precise task representation” and “precise lightweight prompt” are. - The method is also quite complicated and not particularly novel. It seems to be a combination of a number of existing components (e.g., AWR, world model, transformer-based policies)

Reviewer 03Rating 6Confidence 4

Strengths

1. The proposed world model effectively captures task-relevant information, providing a more compact and informative representation compared to previous ICRL approaches. 2. The integration of Q-function learning and advantage-weighted regression (AWR) allows the model to optimize its policy from suboptimal data and obtain better-performing behaviors. 3. The experimental results are relatively comprehensive, with comparisons to multiple baselines demonstrating consistent improvement

Weaknesses

1. The experimental environments are relatively simple. It would strengthen the work to include results on more challenging or diverse environments.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaze Tracking and Assistive Technology