Interval Dominance based Structural Results for Markov Decision Process
Vikram Krishnamurthy

TL;DR
This paper introduces a new structural condition called interval dominance for Markov decision processes, enabling the derivation of monotone optimal policies in cases where classical supermodularity assumptions do not hold.
Contribution
It extends structural results for MDPs by using interval dominance, allowing for monotone policies under more general conditions than supermodularity.
Findings
Interval dominance condition applies to various MDPs with non-supermodular rewards.
Monotone optimal policies are identified in models with sigmoidal rewards and specific transition matrices.
Reinforcement learning algorithms exploiting the structure are discussed.
Abstract
Structural results impose sufficient conditions on the model parameters of a Markov decision process (MDP) so that the optimal policy is an increasing function of the underlying state. The classical assumptions for MDP structural results require supermodularity of the rewards and transition probabilities. However, supermodularity does not hold in many applications. This paper uses a sufficient condition for interval dominance (called I) proposed in the microeconomics literature, to obtain structural results for MDPs under more general conditions. We present several MDP examples where supermodularity does not hold, yet I holds, and so the optimal policy is monotone; these include sigmoidal rewards (arising in prospect theory for human decision making), bi-diagonal and perturbed bi-diagonal transition matrices (in optimal allocation problems). We also consider MDPs with TP3 transition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSupply Chain and Inventory Management · Auction Theory and Applications · Reinforcement Learning in Robotics
