Learning POMDPs with Linear Function Approximation and Finite Memory

Ali Devran Kara

arXiv:2505.14879·math.OC·May 22, 2025

Learning POMDPs with Linear Function Approximation and Finite Memory

Ali Devran Kara

PDF

Open Access

TL;DR

This paper develops algorithms for reinforcement learning in POMDPs using linear function approximation and finite memory, providing error bounds and convergence guarantees under various assumptions.

Contribution

It introduces a new algorithm for value evaluation and learning of near-optimal Q-values in POMDPs with finite memory, with relaxed assumptions for specific models.

Findings

01

Provided error bounds based on filter stability and projection errors.

02

Achieved convergence guarantees under certain exploration policies.

03

Extended applicability to models with linear costs and discretization-based basis functions.

Abstract

We study reinforcement learning with linear function approximation and finite-memory approximations for partially observed Markov decision processes (POMDPs). We first present an algorithm for the value evaluation of finite-memory feedback policies. We provide error bounds derived from filter stability and projection errors. We then study the learning of finite-memory based near-optimal Q values. Convergence in this case requires further assumptions on the exploration policy when using general basis functions. We then show that these assumptions can be relaxed for specific models such as those with perfectly linear cost and dynamics, or when using discretization based basis functions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Text and Document Classification Technologies · Fault Detection and Control Systems