Zero-Shot Reinforcement Learning Under Partial Observability

Scott Jeen; Tom Bewley; Jonathan M. Cullen

arXiv:2506.15446·cs.LG·June 19, 2025

Zero-Shot Reinforcement Learning Under Partial Observability

Scott Jeen, Tom Bewley, Jonathan M. Cullen

PDF

Open Access

TL;DR

This paper investigates the limitations of zero-shot reinforcement learning under partial observability and demonstrates that memory-based architectures can significantly improve performance in such settings.

Contribution

It introduces memory-based methods for zero-shot RL under partial observability and empirically shows their effectiveness over memory-free approaches.

Findings

01

Memory-based zero-shot RL outperforms memory-free baselines.

02

Partial observability degrades standard zero-shot RL performance.

03

Memory architectures mitigate the effects of partial observability.

Abstract

Recent work has shown that, under certain assumptions, zero-shot reinforcement learning (RL) methods can generalise to any unseen task in an environment after reward-free pre-training. Access to Markov states is one such assumption, yet, in many real-world applications, the Markov state is only partially observable. Here, we explore how the performance of standard zero-shot RL methods degrades when subjected to partially observability, and show that, as in single-task RL, memory-based architectures are an effective remedy. We evaluate our memory-based zero-shot RL methods in domains where the states, rewards and a change in dynamics are partially observed, and show improved performance over memory-free baselines. Our code is open-sourced via: https://enjeeneer.io/projects/bfms-with-memory/.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Distributed Sensor Networks and Detection Algorithms