Finite-Horizon Markov Decision Processes with Sequentially-Observed Transitions
Mahmoud El Chamie, Behcet Acikmese

TL;DR
This paper introduces an extended Markov Decision Process model that incorporates sequential transition observations, enabling the synthesis of more effective decision policies through an efficient linear programming algorithm.
Contribution
It extends standard MDPs by including sequential transition observations and provides an efficient offline algorithm for optimal policy synthesis.
Findings
Enhanced policies outperform standard MDPs due to additional transition information
Proposed linear programming algorithm efficiently computes optimal policies
Model applicable to decision problems with sequential transition data
Abstract
Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (or minimize costs) in a given stochastic dynamical environment. In this paper, we extend this model by incorporating additional information that the transitions due to actions can be sequentially observed. The proposed model benefits from this information and produces policies with better performance than those of standard MDPs. The paper also presents an efficient offline linear programming based algorithm to synthesize optimal policies for the extended model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Advanced Bandit Algorithms Research
