Efficient Learning of POMDPs with Known Observation Model in   Average-Reward Setting

Alessio Russo; Alberto Maria Metelli; Marcello Restelli

arXiv:2410.01331·cs.LG·October 3, 2024

Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting

Alessio Russo, Alberto Maria Metelli, Marcello Restelli

PDF

Open Access

TL;DR

This paper introduces an efficient method for learning average-reward POMDPs with known observation models, using spectral estimation and an exploration strategy that guarantees low regret and scales well with problem size.

Contribution

It proposes the OAS spectral estimation technique and the OAS-UCRL algorithm, providing the first regret guarantees for POMDPs with known observation models in the average-reward setting.

Findings

01

Regret bound of order O(√T log T) for the proposed algorithm.

02

Efficient scaling with state, action, and observation space dimensions.

03

Numerical simulations validate the approach against baselines.

Abstract

Dealing with Partially Observable Markov Decision Processes is notably a challenging task. We face an average-reward infinite-horizon POMDP setting with an unknown transition model, where we assume the knowledge of the observation model. Under this assumption, we propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy. Then, we propose the OAS-UCRL algorithm that implicitly balances the exploration-exploitation trade-off following the $optimism in the face of uncertainty$ principle. The algorithm runs through episodes of increasing length. For each episode, the optimal belief-based policy of the estimated POMDP interacts with the environment and collects samples that will be used in the next episode by the OAS estimation procedure to compute a new estimate of the POMDP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Criteria Decision Making · Fuzzy Systems and Optimization