Structural Estimation of Partially Observable Markov Decision Processes
Yanling Chang, Alfredo Garcia, Zhide Wang, Lu Sun

TL;DR
This paper develops a method for estimating the underlying parameters of POMDP models using observable data, ensuring identifiability under certain conditions, and demonstrates its application to equipment replacement decisions.
Contribution
It introduces a novel estimation approach for POMDP primitives based on observable histories, with theoretical guarantees and practical implementation via a policy gradient algorithm.
Findings
The estimation method is robust with synthetic and real data.
Ignoring partial observability can lead to misspecification.
The approach achieves convergence to a stationary point in finite time.
Abstract
In many practical settings control decisions must be made under partial/imperfect information about the evolution of a relevant state variable. Partially Observable Markov Decision Processes (POMDPs) is a relatively well-developed framework for modeling and analyzing such problems. In this paper we consider the structural estimation of the primitives of a POMDP model based upon the observable history of the process. We analyze the structural properties of POMDP model with random rewards and specify conditions under which the model is identifiable without knowledge of the state dynamics. We consider a soft policy gradient algorithm to compute a maximum likelihood estimator and provide a finite-time characterization of convergence to a stationary point. We illustrate the estimation methodology with an application to optimal equipment replacement. In this context, replacement decisions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
