CRED: Counterfactual Reasoning and Environment Design for Active Preference Learning

Yi-Shiuan Tung; Bradley Hayes; Alessandro Roncone

arXiv:2507.05458·cs.RO·July 9, 2025

CRED: Counterfactual Reasoning and Environment Design for Active Preference Learning

Yi-Shiuan Tung, Bradley Hayes, Alessandro Roncone

PDF

Open Access

TL;DR

CRED enhances active preference learning for robots by jointly optimizing environment design and trajectory selection using counterfactual reasoning, leading to better reward estimation and generalization in navigation tasks.

Contribution

It introduces a novel trajectory generation method that combines environment design and counterfactual reasoning to improve reward learning in active preference learning.

Findings

01

CRED outperforms existing methods in GridWorld and real-world navigation tasks.

02

It achieves more accurate reward estimation and better generalization.

03

The approach effectively explores the trajectory space for informative queries.

Abstract

For effective real-world deployment, robots should adapt to human preferences, such as balancing distance, time, and safety in delivery routing. Active preference learning (APL) learns human reward functions by presenting trajectories for ranking. However, existing methods often struggle to explore the full trajectory space and fail to identify informative queries, particularly in long-horizon tasks. We propose CRED, a trajectory generation method for APL that improves reward estimation by jointly optimizing environment design and trajectory selection. CRED "imagines" new scenarios through environment design and uses counterfactual reasoning -- by sampling rewards from its current belief and asking "What if this reward were the true preference?" -- to generate a diverse and informative set of trajectories for ranking. Experiments in GridWorld and real-world navigation using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Constraint Satisfaction and Optimization · Human Mobility and Location-Based Analysis