Memoryless Policy Iteration for Episodic POMDPs

Roy van Zuijlen; Duarte Antunes

arXiv:2512.11082·cs.LG·December 15, 2025

Memoryless Policy Iteration for Episodic POMDPs

Roy van Zuijlen, Duarte Antunes

PDF

Open Access

TL;DR

This paper introduces a novel class of memoryless policy iteration algorithms for episodic POMDPs that improve computational efficiency and can be implemented in a model-free manner, outperforming existing methods.

Contribution

It proposes a new family of policy iteration algorithms for POMDPs that operate directly in output space and identifies optimal periodic patterns for policy improvement.

Findings

01

Achieves significant speedups over policy-gradient methods.

02

Develops a model-free variant that learns directly from data.

03

Demonstrates effectiveness across several POMDP examples.

Abstract

Memoryless and finite-memory policies offer a practical alternative for solving partially observable Markov decision processes (POMDPs), as they operate directly in the output space rather than in the high-dimensional belief space. However, extending classical methods such as policy iteration to this setting remains difficult; the output process is non-Markovian, making policy-improvement steps interdependent across stages. We introduce a new family of monotonically improving policy-iteration algorithms that alternate between single-stage output-based policy improvements and policy evaluations according to a prescribed periodic pattern. We show that this family admits optimal patterns that maximize a natural computational-efficiency index, and we identify the simplest pattern with minimal period. Building on this structure, we further develop a model-free variant that estimates values…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Generative Adversarial Networks and Image Synthesis