Learning in Context, Guided by Choice: A Reward-Free Paradigm for Reinforcement Learning with Transformers

Juncheng Dong; Bowen He; Moyang Guo; Ethan X. Fang; Zhuoran Yang; Vahid Tarokh

arXiv:2602.08244·cs.LG·February 10, 2026

Learning in Context, Guided by Choice: A Reward-Free Paradigm for Reinforcement Learning with Transformers

Juncheng Dong, Bowen He, Moyang Guo, Ethan X. Fang, Zhuoran Yang, Vahid Tarokh

PDF

Open Access

TL;DR

This paper introduces a new paradigm for reinforcement learning that relies solely on preference feedback instead of explicit rewards, enabling effective in-context learning and generalization without reward supervision.

Contribution

It proposes ICPRL, a preference-based reinforcement learning framework that eliminates the need for reward signals, and demonstrates its effectiveness across various tasks.

Findings

01

ICPRL achieves comparable performance to reward-based methods.

02

Supervised pretraining remains effective with preference-only data.

03

Preference-native frameworks improve data efficiency.

Abstract

In-context reinforcement learning (ICRL) leverages the in-context learning capabilities of transformer models (TMs) to efficiently generalize to unseen sequential decision-making tasks without parameter updates. However, existing ICRL methods rely on explicit reward signals during pretraining, which limits their applicability when rewards are ambiguous, hard to specify, or costly to obtain. To overcome this limitation, we propose a new learning paradigm, In-Context Preference-based Reinforcement Learning (ICPRL), in which both pretraining and deployment rely solely on preference feedback, eliminating the need for reward supervision. We study two variants that differ in the granularity of feedback: Immediate Preference-based RL (I-PRL) with per-step preferences, and Trajectory Preference-based RL (T-PRL) with trajectory-level comparisons. We first show that supervised pretraining, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety