Hidden Markov Model Estimation-Based Q-learning for Partially Observable   Markov Decision Process

Hyung-Jin Yoon; Donghwan Lee; and Naira Hovakimyan

arXiv:1809.06401·cs.LG·September 25, 2018

Hidden Markov Model Estimation-Based Q-learning for Partially Observable Markov Decision Process

Hyung-Jin Yoon, Donghwan Lee, and Naira Hovakimyan

PDF

TL;DR

This paper introduces a recursive HMM estimation-based Q-learning algorithm for POMDPs, enabling better learning when full state observations are unavailable, with proven convergence properties.

Contribution

It formulates POMDP estimation as an HMM problem and develops a recursive algorithm for joint estimation of POMDP parameters and Q-function, with convergence guarantees.

Findings

01

Converges to stationary points of maximum likelihood estimates.

02

Q-function converges to a Bellman optimality fixed point.

03

Effective for partially observable environments.

Abstract

The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsQ-Learning