Finite-Horizon Markov Decision Processes with Sequentially-Observed   Transitions

Mahmoud El Chamie; Behcet Acikmese

arXiv:1507.01151·math.OC·July 7, 2015·1 cites

Finite-Horizon Markov Decision Processes with Sequentially-Observed Transitions

Mahmoud El Chamie, Behcet Acikmese

PDF

Open Access

TL;DR

This paper introduces an extended Markov Decision Process model that incorporates sequential transition observations, enabling the synthesis of more effective decision policies through an efficient linear programming algorithm.

Contribution

It extends standard MDPs by including sequential transition observations and provides an efficient offline algorithm for optimal policy synthesis.

Findings

01

Enhanced policies outperform standard MDPs due to additional transition information

02

Proposed linear programming algorithm efficiently computes optimal policies

03

Model applicable to decision problems with sequential transition data

Abstract

Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (or minimize costs) in a given stochastic dynamical environment. In this paper, we extend this model by incorporating additional information that the transitions due to actions can be sequentially observed. The proposed model benefits from this information and produces policies with better performance than those of standard MDPs. The paper also presents an efficient offline linear programming based algorithm to synthesize optimal policies for the extended model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Advanced Bandit Algorithms Research