Provable Distributional Value Iteration under Partial Observability

Larry Preuett III; Qiuyi Zhang; and Muhammad Aurangzeb Ahmad

arXiv:2505.06518·cs.AI·May 7, 2026

Provable Distributional Value Iteration under Partial Observability

Larry Preuett III, Qiuyi Zhang, and Muhammad Aurangzeb Ahmad

PDF

TL;DR

This paper extends Distributional Reinforcement Learning to POMDPs, introducing new operators and algorithms that handle uncertainty and partial observability in planning tasks.

Contribution

It proposes a distributional Bellman operator for POMDPs, proves its convergence, and develops DPBVI, a novel planning algorithm combining distributional RL with point-based methods.

Findings

01

DPBVI recovers classical PBVI in the risk-neutral case

02

The new operators converge under the supremum p-Wasserstein metric

03

Distributional approach captures the full return distribution in POMDPs

Abstract

In many real-world planning tasks, agents must tackle uncertainty about the environment's state and variability in the outcomes induced by stochastic dynamics and rewards. Motivated by recent progress in world model approaches, where latent models approximate beliefs and support planning, we extend Distributional Reinforcement Learning (DistRL), which models the entire return distribution for fully observable domains, to Partially Observable Markov Decision Processes (POMDPs). Concretely, we introduce new distributional Bellman operators for partial observability and prove their convergence under the supremum p-Wasserstein metric. We also propose a finite representation of these return distributions via psi-vectors, generalizing the classical alpha-vectors in POMDP solvers. Building on this, we develop Distributional Point-Based Value Iteration (DPBVI), which integrates psi-vectors into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.