Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options

Joongkyu Lee; Seouh-won Yi; Min-hwan Oh

arXiv:2510.18713·cs.LG·February 6, 2026

Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options

Joongkyu Lee, Seouh-won Yi, Min-hwan Oh

PDF

Open Access

TL;DR

This paper introduces a new algorithm for preference-based reinforcement learning that leverages multiple options and ranking feedback, demonstrating improved sample efficiency as subset size increases, with strong theoretical guarantees.

Contribution

We propose M-AUPO, an algorithm that uses ranking feedback over action subsets, with theoretical analysis showing improved performance with larger subsets and overcoming previous limitations.

Findings

01

Larger subsets improve sample efficiency in PbRL.

02

M-AUPO achieves a suboptimality gap that decreases with subset size.

03

Theoretical bounds show benefits of multiple options in ranking feedback.

Abstract

We study online preference-based reinforcement learning (PbRL) with the goal of improving sample efficiency. While a growing body of theoretical work has emerged-motivated by PbRL's recent empirical success, particularly in aligning large language models (LLMs)-most existing studies focus only on pairwise comparisons. A few recent works (Zhu et al., 2023, Mukherjee et al., 2024, Thekumparampil et al., 2024) have explored using multiple comparisons and ranking feedback, but their performance guarantees fail to improve-and can even deteriorate-as the feedback length increases, despite the richer information available. To address this gap, we adopt the Plackett-Luce (PL) model for ranking feedback over action subsets and propose M-AUPO, an algorithm that selects multiple actions by maximizing the average uncertainty within the offered subset. We prove that M-AUPO achieves a suboptimality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Multimodal Machine Learning Applications