On the Expressivity of Markov Reward
David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L., Littman, Doina Precup, Satinder Singh

TL;DR
This paper investigates the limits of reward functions in reinforcement learning, showing that some task specifications cannot be captured by Markov rewards, and provides algorithms to construct such rewards when possible.
Contribution
It introduces three new abstract notions of tasks, proves the limitations of Markov rewards in expressing them, and offers polynomial-time algorithms to construct rewards or identify impossibility.
Findings
Certain task types cannot be captured by Markov reward functions.
Algorithms can construct rewards for expressible tasks in polynomial time.
Empirical results support theoretical limitations and algorithm effectiveness.
Abstract
Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Receptor Mechanisms and Signaling
