Structural Estimation of Partially Observable Markov Decision Processes

Yanling Chang; Alfredo Garcia; Zhide Wang; Lu Sun

arXiv:2008.00500·cs.LG·December 30, 2021·1 cites

Structural Estimation of Partially Observable Markov Decision Processes

Yanling Chang, Alfredo Garcia, Zhide Wang, Lu Sun

PDF

Open Access

TL;DR

This paper develops a method for estimating the underlying parameters of POMDP models using observable data, ensuring identifiability under certain conditions, and demonstrates its application to equipment replacement decisions.

Contribution

It introduces a novel estimation approach for POMDP primitives based on observable histories, with theoretical guarantees and practical implementation via a policy gradient algorithm.

Findings

01

The estimation method is robust with synthetic and real data.

02

Ignoring partial observability can lead to misspecification.

03

The approach achieves convergence to a stationary point in finite time.

Abstract

In many practical settings control decisions must be made under partial/imperfect information about the evolution of a relevant state variable. Partially Observable Markov Decision Processes (POMDPs) is a relatively well-developed framework for modeling and analyzing such problems. In this paper we consider the structural estimation of the primitives of a POMDP model based upon the observable history of the process. We analyze the structural properties of POMDP model with random rewards and specify conditions under which the model is identifiable without knowledge of the state dynamics. We consider a soft policy gradient algorithm to compute a maximum likelihood estimator and provide a finite-time characterization of convergence to a stationary point. We illustrate the estimation methodology with an application to optimal equipment replacement. In this context, replacement decisions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems