STARC: A General Framework For Quantifying Differences Between Reward   Functions

Joar Skalse; Lucy Farnik; Sumeet Ramesh Motwani; Erik Jenner; Adam; Gleave; Alessandro Abate

arXiv:2309.15257·cs.LG·December 13, 2024

STARC: A General Framework For Quantifying Differences Between Reward Functions

Joar Skalse, Lucy Farnik, Sumeet Ramesh Motwani, Erik Jenner, Adam, Gleave, Alessandro Abate

PDF

Open Access

TL;DR

This paper introduces STARC metrics, a new class of pseudometrics for quantifying differences between reward functions, providing theoretical bounds on regret and facilitating analysis of reward learning algorithms.

Contribution

The paper proposes STARC metrics, establishing their theoretical properties and practical usefulness for evaluating reward functions in reinforcement learning.

Findings

01

STARC metrics bound worst-case regret from above and below.

02

They are tight and bilipschitz equivalent to any similar metric.

03

Empirical evaluation shows their practical effectiveness.

Abstract

In order to solve a task using reinforcement learning, it is necessary to first formalise the goal of that task as a reward function. However, for many real-world tasks, it is very difficult to manually specify a reward function that never incentivises undesirable behaviour. As a result, it is increasingly popular to use reward learning algorithms, which attempt to learn a reward function from data. However, the theoretical foundations of reward learning are not yet well-developed. In particular, it is typically not known when a given reward learning algorithm with high probability will learn a reward function that is safe to optimise. This means that reward learning algorithms generally must be evaluated empirically, which is expensive, and that their failure modes are difficult to anticipate in advance. One of the roadblocks to deriving better theoretical guarantees is the lack of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Receptor Mechanisms and Signaling · Advanced Bandit Algorithms Research