Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs
Mehran Shakerinava, Siamak Ravanbakhsh, Adam Oberman

TL;DR
This paper introduces an axiomatic framework for lexicographic MDPs, showing when scalar rewards are insufficient and how to characterize multi-dimensional reward functions, with implications for policy optimality.
Contribution
It extends Hausner's work by identifying conditions for non-scalar rewards in MDPs and characterizing multi-dimensional reward functions under a memoryless preference assumption.
Findings
Preferences may require multi-dimensional rewards beyond scalar utility.
Optimal policies retain many properties in lexicographic MDPs.
Scalar rewards are insufficient in certain preference structures, necessitating vector rewards.
Abstract
Recent work has formalized the reward hypothesis through the lens of expected utility theory, by interpreting reward as utility. Hausner's foundational work showed that dropping the continuity axiom leads to a generalization of expected utility theory where utilities are lexicographically ordered vectors of arbitrary dimension. In this paper, we extend this result by identifying a simple and practical condition under which preferences cannot be represented by scalar rewards, necessitating a 2-dimensional reward function. We provide a full characterization of such reward functions, as well as the general d-dimensional case, in Markov Decision Processes (MDPs) under a memorylessness assumption on preferences. Furthermore, we show that optimal policies in this setting retain many desirable properties of their scalar-reward counterparts, while in the Constrained MDP (CMDP) setting --…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDecision-Making and Behavioral Economics · Reinforcement Learning in Robotics · Bayesian Modeling and Causal Inference
