Learning Pareto-Optimal Rewards from Noisy Preferences: A Framework for Multi-Objective Inverse Reinforcement Learning
Kalyan Cherukuri, Aarav Lala

TL;DR
This paper develops a theoretical framework for multi-objective inverse reinforcement learning that infers Pareto-optimal reward functions from noisy human preferences, enabling better alignment of generative agents with complex human values.
Contribution
It introduces a formal model for recovering multi-objective reward functions from preferences, establishes sample complexity bounds, and proposes a convergent policy optimization algorithm.
Findings
Derived tight sample complexity bounds for reward recovery.
Formalized conditions for identifying multi-objective reward structures.
Proposed a convergent algorithm for policy optimization with inferred rewards.
Abstract
As generative agents become increasingly capable, alignment of their behavior with complex human values remains a fundamental challenge. Existing approaches often simplify human intent through reduction to a scalar reward, overlooking the multi-faceted nature of human feedback. In this work, we introduce a theoretical framework for preference-based Multi-Objective Inverse Reinforcement Learning (MO-IRL), where human preferences are modeled as latent vector-valued reward functions. We formalize the problem of recovering a Pareto-optimal reward representation from noisy preference queries and establish conditions for identifying the underlying multi-objective structure. We derive tight sample complexity bounds for recovering -approximations of the Pareto front and introduce a regret formulation to quantify suboptimality in this multi-objective setting. Furthermore, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Constraint Satisfaction and Optimization
