Refined Bounds on Near Optimality Finite Window Policies in POMDPs and Their Reinforcement Learning
Yunus Emre Demirci, Ali Devran Kara, Serdar Y\"uksel

TL;DR
This paper refines theoretical bounds on near-optimal finite window policies in POMDPs, extending previous results to Wasserstein distance and providing stronger, more relaxed error bounds for reinforcement learning applications.
Contribution
It extends existing bounds on POMDP policies to Wasserstein distance and offers more relaxed, stronger error bounds for reinforcement learning in partially observable environments.
Findings
Refined bounds using Wasserstein distance for POMDP policies.
Established uniform filter stability in expected Wasserstein distance.
Provided explicit examples demonstrating improved bounds.
Abstract
Finding optimal policies for Partially Observable Markov Decision Processes (POMDPs) is challenging due to their uncountable state spaces when transformed into fully observable Markov Decision Processes (MDPs) using belief states. Traditional methods such as dynamic programming or policy iteration are difficult to apply in this context, necessitating the use of approximation methods on belief states or other techniques. Recently, in (Journal of Machine Learning Research, vol. 23, pp. 1-46, 2022) and (Mathematics of Operations Research, vol. 48, pp. 2066-2093, Nov. 2023), it was shown that sliding finite window based policies are near-optimal for POMDPs with standard Borel valued hidden state spaces, and can be learned via reinforcement learning, with error bounds explicitly dependent on a uniform filter stability term involving total variation in expectation and sample path-wise,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Optical Network Technologies
