Improving Zero-Shot Offline RL via Behavioral Task Sampling

Nazim Bendib; Nicolas Perrin-Gilbert; Olivier Sigaud

arXiv:2604.25496·cs.AI·April 29, 2026

Improving Zero-Shot Offline RL via Behavioral Task Sampling

Nazim Bendib, Nicolas Perrin-Gilbert, Olivier Sigaud

PDF

TL;DR

This paper proposes a method to improve offline zero-shot RL by extracting task vectors from datasets, leading to better generalization without environment interaction.

Contribution

It introduces a simple, general reward extraction procedure that enhances zero-shot performance by more principled task sampling in offline RL.

Findings

01

Improves zero-shot performance by an average of 20% across benchmarks.

02

Extracting task vectors from datasets outperforms random sampling.

03

Enhances existing offline zero-shot RL algorithms with minimal modifications.

Abstract

Offline zero-shot reinforcement learning (RL) aims to learn agents that optimize unseen reward functions without additional environment interaction. The standard approach to this problem trains task-conditioned policies by sampling task vectors that define linear reward functions over learned state representations. In most existing algorithms, these task vectors are randomly sampled, implicitly assuming this adequately captures the structure of the task space. We argue that doing so leads to suboptimal zero-shot generalization. To address this limitation, we propose extracting task vectors directly from the offline dataset and using them to define the task distribution used for policy training. We introduce a simple and general reward function extraction procedure that integrates into existing offline zero-shot RL algorithms. Across multiple benchmark environments and baselines, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.