MDPs with Unawareness
Joseph Y. Halpern, Nan Rong, Ashutosh Saxena

TL;DR
This paper introduces MDPs with unawareness (MDPUs), a framework for decision-making when the decision maker may not know all possible actions, and provides algorithms for near-optimal learning in such settings.
Contribution
It defines MDPUs to model unawareness of actions and characterizes conditions for efficient near-optimal learning, including polynomial-time solutions.
Findings
Characterization of when near-optimal solutions are achievable
Development of an efficient learning algorithm for MDPUs
Conditions for polynomial-time near-optimal solutions
Abstract
Markov decision processes (MDPs) are widely used for modeling decision-making problems in robotics, automated control, and economics. Traditional MDPs assume that the decision maker (DM) knows all states and actions. However, this may not be true in many situations of interest. We define a new framework, MDPs with unawareness (MDPUs) to deal with the possibilities that a DM may not be aware of all possible actions. We provide a complete characterization of when a DM can learn to play near-optimally in an MDPU, and give an algorithm that learns to play near-optimally when it is possible to do so, as efficiently as possible. In particular, we characterize when a near-optimal solution can be found in polynomial time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
