MDPs with Unawareness

Joseph Y. Halpern; Nan Rong; Ashutosh Saxena

arXiv:1407.7191·cs.AI·July 29, 2014

MDPs with Unawareness

Joseph Y. Halpern, Nan Rong, Ashutosh Saxena

PDF

TL;DR

This paper introduces MDPs with unawareness (MDPUs), a framework for decision-making when the decision maker may not know all possible actions, and provides algorithms for near-optimal learning in such settings.

Contribution

It defines MDPUs to model unawareness of actions and characterizes conditions for efficient near-optimal learning, including polynomial-time solutions.

Findings

01

Characterization of when near-optimal solutions are achievable

02

Development of an efficient learning algorithm for MDPUs

03

Conditions for polynomial-time near-optimal solutions

Abstract

Markov decision processes (MDPs) are widely used for modeling decision-making problems in robotics, automated control, and economics. Traditional MDPs assume that the decision maker (DM) knows all states and actions. However, this may not be true in many situations of interest. We define a new framework, MDPs with unawareness (MDPUs) to deal with the possibilities that a DM may not be aware of all possible actions. We provide a complete characterization of when a DM can learn to play near-optimally in an MDPU, and give an algorithm that learns to play near-optimally when it is possible to do so, as efficiently as possible. In particular, we characterize when a near-optimal solution can be found in polynomial time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.