Model-Based Learning of Near-Optimal Finite-Window Policies in POMDPs

Philip Jordan; Maryam Kamgarpour

arXiv:2604.01024·cs.LG·April 2, 2026

Model-Based Learning of Near-Optimal Finite-Window Policies in POMDPs

Philip Jordan, Maryam Kamgarpour

PDF

TL;DR

This paper presents a sample-efficient method for learning near-optimal finite-window policies in POMDPs by estimating a superstate MDP model from a single trajectory, leveraging filter stability and concentration inequalities.

Contribution

It introduces a novel model estimation procedure for tabular POMDPs with tight sample complexity guarantees, enabling effective policy computation.

Findings

01

Achieves near-optimal policies with a single trajectory.

02

Provides tight sample complexity bounds for model estimation.

03

Connects filter stability with concentration inequalities for dependent variables.

Abstract

We study model-based learning of finite-window policies in tabular partially observable Markov decision processes (POMDPs). A common approach to learning under partial observability is to approximate unbounded history dependencies using finite action-observation windows. This induces a finite-state Markov decision process (MDP) over histories, referred to as the superstate MDP. Once a model of this superstate MDP is available, standard MDP algorithms can be used to compute optimal policies, motivating the need for sample-efficient model estimation. Estimating the superstate MDP model is challenging because trajectories are generated by interaction with the original POMDP, creating a mismatch between the sampling process and target model. We propose a model estimation procedure for tabular POMDPs and analyze its sample complexity. Our analysis exploits a connection between filter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.