Sequential Stochastic Optimization in Separable Learning Environments
R. Reid Bishop, Chelsea C. White III

TL;DR
This paper introduces the SEP-POMDP, a structured class of partially observed Markov decision processes that simplifies decision-making under uncertainty by separating estimation from control, applicable across various fields.
Contribution
The paper defines the SEP-POMDP model, demonstrating its ability to unify classical and modern decision-making frameworks and enabling specialized approximate solution methods.
Findings
SEP-POMDP inherits value function and policy structure from fully observed MDPs.
Model applies broadly to inventory, finance, and healthcare systems.
Facilitates bridging classical models with machine learning-based predictive methods.
Abstract
We consider a class of sequential decision-making problems under uncertainty that can encompass various types of supervised learning concepts. These problems have a completely observed state process and a partially observed modulation process, where the state process is affected by the modulation process only through an observation process, the observation process only observes the modulation process, and the modulation process is exogenous to control. We model this broad class of problems as a partially observed Markov decision process (POMDP). The belief function for the modulation process is control invariant, thus separating the estimation of the modulation process from the control of the state process. We call this specially structured POMDP the separable POMDP, or SEP-POMDP, and show it (i) can serve as a model for a broad class of application areas, e.g., inventory control,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Machine Learning and Algorithms · Advanced Statistical Process Monitoring
