Comparing Few to Rank Many: Active Human Preference Learning using   Randomized Frank-Wolfe

Kiran Koshy Thekumparampil; Gaurush Hiranandani; Kousha Kalantari,; Shoham Sabach; Branislav Kveton

arXiv:2412.19396·cs.LG·December 30, 2024

Comparing Few to Rank Many: Active Human Preference Learning using Randomized Frank-Wolfe

Kiran Koshy Thekumparampil, Gaurush Hiranandani, Kousha Kalantari,, Shoham Sabach, Branislav Kveton

PDF

Open Access

TL;DR

This paper introduces a randomized Frank-Wolfe algorithm for efficiently learning human preferences modeled by Plackett-Luce from limited comparison data, significantly reducing computational complexity.

Contribution

It proposes a novel randomized Frank-Wolfe approach to solve D-optimal design problems for preference learning, enabling scalable and efficient data collection.

Findings

01

The algorithm effectively reduces computational complexity.

02

Empirical results demonstrate strong performance on NLP datasets.

03

The method outperforms traditional approaches in efficiency.

Abstract

We study learning of human preferences from a limited comparison feedback. This task is ubiquitous in machine learning. Its applications such as reinforcement learning from human feedback, have been transformational. We formulate this problem as learning a Plackett-Luce model over a universe of $N$ choices from $K$ -way comparison feedback, where typically $K ≪ N$ . Our solution is the D-optimal design for the Plackett-Luce objective. The design defines a data logging policy that elicits comparison feedback for a small collection of optimally chosen points from all $(K N)$ feasible subsets. The main algorithmic challenge in this work is that even fast methods for solving D-optimal designs would have $O ((K N))$ time complexity. To address this issue, we propose a randomized Frank-Wolfe (FW) algorithm that solves the linear maximization sub-problems in the FW method on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Multi-Criteria Decision Making · Text and Document Classification Technologies