Constrained Stochastic Optimal Control with a Baseline Performance   Guarantee

Yinlam Chow; Mohammad Ghavamzadeh

arXiv:1410.2726·math.OC·October 13, 2014

Constrained Stochastic Optimal Control with a Baseline Performance Guarantee

Yinlam Chow, Mohammad Ghavamzadeh

PDF

Open Access

TL;DR

This paper introduces a method to derive a policy from a simulated MDP that guarantees better real-world performance than a baseline policy, with applications in various online decision-making fields.

Contribution

It presents an algorithm to compute a superior policy using simulated MDPs with performance guarantees, advancing safe policy improvement techniques.

Findings

01

Performance bound on sub-optimality of the derived policy

02

Algorithm effectively improves baseline policy in simulated environments

03

Applicable to real-world domains like healthcare and marketing

Abstract

In this paper, we show how a simulated Markov decision process (MDP) built by the so-called \emph{baseline} policies, can be used to compute a different policy, namely the \emph{simulated optimal} policy, for which the performance of this policy is guaranteed to be better than the baseline policy in the real environment. This technique has immense applications in fields such as news recommendation systems, health care diagnosis and digital online marketing. Our proposed algorithm iteratively solves for a "good" policy in the simulated MDP in an offline setting. Furthermore, we provide a performance bound on sub-optimality for the control policy generated by the proposed algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Portfolio Optimization · Advanced Control Systems Optimization · Stochastic processes and financial applications