Learning from Preferences and Mixed Demonstrations in General Settings

Jason R Brown; Carl Henrik Ek; Robert D Mullins

arXiv:2508.14027·cs.LG·August 20, 2025

Learning from Preferences and Mixed Demonstrations in General Settings

Jason R Brown, Carl Henrik Ek, Robert D Mullins

PDF

TL;DR

This paper introduces LEOPARD, a scalable algorithm that learns reward functions from diverse human feedback, including preferences and demonstrations, outperforming existing methods in various domains.

Contribution

The paper proposes a new flexible framework for learning from mixed human feedback and introduces LEOPARD, a practical algorithm that effectively combines preferences and demonstrations.

Findings

01

LEOPARD outperforms baselines with limited feedback.

02

Combining multiple feedback types improves learning.

03

Effective across diverse domains.

Abstract

Reinforcement learning is a general method for learning in sequential settings, but it can often be difficult to specify a good reward function when the task is complex. In these cases, preference feedback or expert demonstrations can be used instead. However, existing approaches utilising both together are often ad-hoc, rely on domain-specific properties, or won't scale. We develop a new framing for learning from human data, \emph{reward-rational partial orderings over observations}, designed to be flexible and scalable. Based on this we introduce a practical algorithm, LEOPARD: Learning Estimated Objectives from Preferences And Ranked Demonstrations. LEOPARD can learn from a broad range of data, including negative demonstrations, to efficiently learn reward functions across a wide range of domains. We find that when a limited amount of preference and demonstration feedback is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.