Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs

Mehran Shakerinava; Siamak Ravanbakhsh; Adam Oberman

arXiv:2505.12049·cs.LG·May 20, 2025

Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs

Mehran Shakerinava, Siamak Ravanbakhsh, Adam Oberman

PDF

Open Access 1 Video

TL;DR

This paper introduces an axiomatic framework for lexicographic MDPs, showing when scalar rewards are insufficient and how to characterize multi-dimensional reward functions, with implications for policy optimality.

Contribution

It extends Hausner's work by identifying conditions for non-scalar rewards in MDPs and characterizing multi-dimensional reward functions under a memoryless preference assumption.

Findings

01

Preferences may require multi-dimensional rewards beyond scalar utility.

02

Optimal policies retain many properties in lexicographic MDPs.

03

Scalar rewards are insufficient in certain preference structures, necessitating vector rewards.

Abstract

Recent work has formalized the reward hypothesis through the lens of expected utility theory, by interpreting reward as utility. Hausner's foundational work showed that dropping the continuity axiom leads to a generalization of expected utility theory where utilities are lexicographically ordered vectors of arbitrary dimension. In this paper, we extend this result by identifying a simple and practical condition under which preferences cannot be represented by scalar rewards, necessitating a 2-dimensional reward function. We provide a full characterization of such reward functions, as well as the general d-dimensional case, in Markov Decision Processes (MDPs) under a memorylessness assumption on preferences. Furthermore, we show that optimal policies in this setting retain many desirable properties of their scalar-reward counterparts, while in the Constrained MDP (CMDP) setting --…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs· slideslive

Taxonomy

TopicsDecision-Making and Behavioral Economics · Reinforcement Learning in Robotics · Bayesian Modeling and Causal Inference