Learning POMDPs with Linear Function Approximation and Finite Memory
Ali Devran Kara

TL;DR
This paper develops algorithms for reinforcement learning in POMDPs using linear function approximation and finite memory, providing error bounds and convergence guarantees under various assumptions.
Contribution
It introduces a new algorithm for value evaluation and learning of near-optimal Q-values in POMDPs with finite memory, with relaxed assumptions for specific models.
Findings
Provided error bounds based on filter stability and projection errors.
Achieved convergence guarantees under certain exploration policies.
Extended applicability to models with linear costs and discretization-based basis functions.
Abstract
We study reinforcement learning with linear function approximation and finite-memory approximations for partially observed Markov decision processes (POMDPs). We first present an algorithm for the value evaluation of finite-memory feedback policies. We provide error bounds derived from filter stability and projection errors. We then study the learning of finite-memory based near-optimal Q values. Convergence in this case requires further assumptions on the exploration policy when using general basis functions. We then show that these assumptions can be relaxed for specific models such as those with perfectly linear cost and dynamics, or when using discretization based basis functions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Text and Document Classification Technologies · Fault Detection and Control Systems
