CRED: Counterfactual Reasoning and Environment Design for Active Preference Learning
Yi-Shiuan Tung, Bradley Hayes, Alessandro Roncone

TL;DR
CRED enhances active preference learning for robots by jointly optimizing environment design and trajectory selection using counterfactual reasoning, leading to better reward estimation and generalization in navigation tasks.
Contribution
It introduces a novel trajectory generation method that combines environment design and counterfactual reasoning to improve reward learning in active preference learning.
Findings
CRED outperforms existing methods in GridWorld and real-world navigation tasks.
It achieves more accurate reward estimation and better generalization.
The approach effectively explores the trajectory space for informative queries.
Abstract
For effective real-world deployment, robots should adapt to human preferences, such as balancing distance, time, and safety in delivery routing. Active preference learning (APL) learns human reward functions by presenting trajectories for ranking. However, existing methods often struggle to explore the full trajectory space and fail to identify informative queries, particularly in long-horizon tasks. We propose CRED, a trajectory generation method for APL that improves reward estimation by jointly optimizing environment design and trajectory selection. CRED "imagines" new scenarios through environment design and uses counterfactual reasoning -- by sampling rewards from its current belief and asking "What if this reward were the true preference?" -- to generate a diverse and informative set of trajectories for ranking. Experiments in GridWorld and real-world navigation using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Constraint Satisfaction and Optimization · Human Mobility and Location-Based Analysis
