Hidden Markov Model Estimation-Based Q-learning for Partially Observable Markov Decision Process
Hyung-Jin Yoon, Donghwan Lee, and Naira Hovakimyan

TL;DR
This paper introduces a recursive HMM estimation-based Q-learning algorithm for POMDPs, enabling better learning when full state observations are unavailable, with proven convergence properties.
Contribution
It formulates POMDP estimation as an HMM problem and develops a recursive algorithm for joint estimation of POMDP parameters and Q-function, with convergence guarantees.
Findings
Converges to stationary points of maximum likelihood estimates.
Q-function converges to a Bellman optimality fixed point.
Effective for partially observable environments.
Abstract
The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsQ-Learning
