Contextual Markov Decision Processes

Assaf Hallak; Dotan Di Castro; Shie Mannor

arXiv:1502.02259·stat.ML·February 10, 2015·72 cites

Contextual Markov Decision Processes

Assaf Hallak, Dotan Di Castro, Shie Mannor

PDF

Open Access

TL;DR

This paper introduces the Contextual Markov Decision Process (CMDP) model for planning in environments where dynamics depend on hidden static parameters, with algorithms that learn and optimize across multiple contexts.

Contribution

The paper proposes a new CMDP framework and algorithms with provable guarantees for learning and optimizing in environments with hidden static contexts.

Findings

01

Algorithms with theoretical guarantees for learning CMDPs

02

Bounds established for naive implementation approaches

03

Framework extensions discussed for future research

Abstract

We consider a planning problem where the dynamics and rewards of the environment depend on a hidden static parameter referred to as the context. The objective is to learn a strategy that maximizes the accumulated reward across all contexts. The new model, called Contextual Markov Decision Process (CMDP), can model a customer's behavior when interacting with a website (the learner). The customer's behavior depends on gender, age, location, device, etc. Based on that behavior, the website objective is to determine customer characteristics, and to optimize the interaction between them. Our work focuses on one basic scenario--finite horizon with a small known number of possible contexts. We suggest a family of algorithms with provable guarantees that learn the underlying models and the latent contexts, and optimize the CMDPs. Bounds are obtained for specific naive implementations, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Simulation Techniques and Applications · Advanced Bandit Algorithms Research