Learning Pareto-Optimal Rewards from Noisy Preferences: A Framework for Multi-Objective Inverse Reinforcement Learning

Kalyan Cherukuri; Aarav Lala

arXiv:2505.11864·cs.LG·July 30, 2025

Learning Pareto-Optimal Rewards from Noisy Preferences: A Framework for Multi-Objective Inverse Reinforcement Learning

Kalyan Cherukuri, Aarav Lala

PDF

Open Access

TL;DR

This paper develops a theoretical framework for multi-objective inverse reinforcement learning that infers Pareto-optimal reward functions from noisy human preferences, enabling better alignment of generative agents with complex human values.

Contribution

It introduces a formal model for recovering multi-objective reward functions from preferences, establishes sample complexity bounds, and proposes a convergent policy optimization algorithm.

Findings

01

Derived tight sample complexity bounds for reward recovery.

02

Formalized conditions for identifying multi-objective reward structures.

03

Proposed a convergent algorithm for policy optimization with inferred rewards.

Abstract

As generative agents become increasingly capable, alignment of their behavior with complex human values remains a fundamental challenge. Existing approaches often simplify human intent through reduction to a scalar reward, overlooking the multi-faceted nature of human feedback. In this work, we introduce a theoretical framework for preference-based Multi-Objective Inverse Reinforcement Learning (MO-IRL), where human preferences are modeled as latent vector-valued reward functions. We formalize the problem of recovering a Pareto-optimal reward representation from noisy preference queries and establish conditions for identifying the underlying multi-objective structure. We derive tight sample complexity bounds for recovering $ϵ$ -approximations of the Pareto front and introduce a regret formulation to quantify suboptimality in this multi-objective setting. Furthermore, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Constraint Satisfaction and Optimization