Reward-rational (implicit) choice: A unifying formalism for reward learning
Hong Jun Jeon, Smitha Milli, Anca D. Dragan

TL;DR
This paper introduces a unifying formalism for reward learning that interprets diverse human behaviors as reward-rational choices, enabling a comprehensive understanding and interpretation of various feedback types in robot training.
Contribution
It proposes a single formalism to interpret multiple types of human feedback as reward-rational choices, unifying previous approaches and guiding future research.
Findings
Interpreted new feedback types within the formalism
Analyzed how feedback choices leak information about rewards
Demonstrated the formalism's applicability to diverse behaviors
Abstract
It is often difficult to hand-specify what the correct reward function is for a task, so researchers have instead aimed to learn reward functions from human behavior or feedback. The types of behavior interpreted as evidence of the reward function have expanded greatly in recent years. We've gone from demonstrations, to comparisons, to reading into the information leaked when the human is pushing the robot away or turning it off. And surely, there is more to come. How will a robot make sense of all these diverse types of behavior? Our key insight is that different types of behavior can be interpreted in a single unifying formalism - as a reward-rational choice that the human is making, often implicitly. The formalism offers both a unifying lens with which to view past work, as well as a recipe for interpreting new sources of information that are yet to be uncovered. We provide two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Receptor Mechanisms and Signaling
