STARC: A General Framework For Quantifying Differences Between Reward Functions
Joar Skalse, Lucy Farnik, Sumeet Ramesh Motwani, Erik Jenner, Adam, Gleave, Alessandro Abate

TL;DR
This paper introduces STARC metrics, a new class of pseudometrics for quantifying differences between reward functions, providing theoretical bounds on regret and facilitating analysis of reward learning algorithms.
Contribution
The paper proposes STARC metrics, establishing their theoretical properties and practical usefulness for evaluating reward functions in reinforcement learning.
Findings
STARC metrics bound worst-case regret from above and below.
They are tight and bilipschitz equivalent to any similar metric.
Empirical evaluation shows their practical effectiveness.
Abstract
In order to solve a task using reinforcement learning, it is necessary to first formalise the goal of that task as a reward function. However, for many real-world tasks, it is very difficult to manually specify a reward function that never incentivises undesirable behaviour. As a result, it is increasingly popular to use reward learning algorithms, which attempt to learn a reward function from data. However, the theoretical foundations of reward learning are not yet well-developed. In particular, it is typically not known when a given reward learning algorithm with high probability will learn a reward function that is safe to optimise. This means that reward learning algorithms generally must be evaluated empirically, which is expensive, and that their failure modes are difficult to anticipate in advance. One of the roadblocks to deriving better theoretical guarantees is the lack of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Receptor Mechanisms and Signaling · Advanced Bandit Algorithms Research
