Markov Decision Processes with Continuous Side Information

Aditya Modi; Nan Jiang; Satinder Singh; Ambuj Tewari

arXiv:1711.05726·stat.ML·October 24, 2019·6 cites

Markov Decision Processes with Continuous Side Information

Aditya Modi, Nan Jiang, Satinder Singh, Ambuj Tewari

PDF

Open Access

TL;DR

This paper studies reinforcement learning in episodic Markov Decision Processes where each episode's dynamics depend on observed context, proposing algorithms under smoothness assumptions and analyzing their theoretical PAC bounds.

Contribution

It introduces algorithms for contextual MDPs with smooth parameter variation and provides PAC bounds, including a tractable linear setting with KWIK-based learning.

Findings

01

PAC bounds under smoothness assumptions

02

Lower bound showing exponential dependence on dimension

03

A linear setting with a KWIK-based PAC algorithm

Abstract

We consider a reinforcement learning (RL) setting in which the agent interacts with a sequence of episodic MDPs. At the start of each episode the agent has access to some side-information or context that determines the dynamics of the MDP for that episode. Our setting is motivated by applications in healthcare where baseline measurements of a patient at the start of a treatment episode form the context that may provide information about how the patient might respond to treatment decisions. We propose algorithms for learning in such Contextual Markov Decision Processes (CMDPs) under an assumption that the unobserved MDP parameters vary smoothly with the observed context. We also give lower and upper PAC bounds under the smoothness assumption. Because our lower bound has an exponential dependence on the dimension, we consider a tractable linear setting where the context is used to create…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms