Choice Between Partial Trajectories: Disentangling Goals from Beliefs

Henrik Marklund; Benjamin Van Roy

arXiv:2410.22690·cs.LG·December 24, 2024

Choice Between Partial Trajectories: Disentangling Goals from Beliefs

Henrik Marklund, Benjamin Van Roy

PDF

Open Access

TL;DR

This paper introduces a novel choice model based on bootstrapped return for AI agents, enabling better disentanglement of goals from beliefs and more robust reward learning from human preferences.

Contribution

It proposes a bootstrapped return model for choice behavior, formalizes its properties with an Alignment Theorem, and demonstrates its advantages over previous models in disentangling goals from beliefs.

Findings

01

Bootstrapped return model aligns reward learning with human beliefs.

02

Model is robust to choices based on partial return or cumulative advantage.

03

Formal proof via the Alignment Theorem supports the model's effectiveness.

Abstract

As AI agents generate increasingly sophisticated behaviors, manually encoding human preferences to guide these agents becomes more challenging. To address this, it has been suggested that agents instead learn preferences from human choice data. This approach requires a model of choice behavior that the agent can use to interpret the data. For choices between partial trajectories of states and actions, previous models assume choice probabilities are determined by the partial return or the cumulative advantage. We consider an alternative model based instead on the bootstrapped return, which adds to the partial return an estimate of the future return. Benefits of the bootstrapped return model stem from its treatment of human beliefs. Unlike partial return, choices based on bootstrapped return reflect human beliefs about the environment. Further, while recovering the reward function from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics and Applications