VickreyFeedback: Cost-efficient Data Construction for Reinforcement   Learning from Human Feedback

Guoxi Zhang; Jiuding Duan

arXiv:2409.18417·cs.LG·December 13, 2024

VickreyFeedback: Cost-efficient Data Construction for Reinforcement Learning from Human Feedback

Guoxi Zhang, Jiuding Duan

PDF

Open Access

TL;DR

This paper proposes an auction-based mechanism to improve the cost-efficiency of collecting human preference data for fine-tuning large language models, maintaining performance while reducing costs.

Contribution

It introduces an auction mechanism for preference data collection in RLHF, addressing cost-efficiency and complex preference relationships.

Findings

01

Auction mechanism improves cost-efficiency in preference data collection.

02

Maintains satisfactory model performance with reduced costs.

03

Effective for high-quality feedback in LLM fine-tuning.

Abstract

This paper addresses the cost-efficiency aspect of Reinforcement Learning from Human Feedback (RLHF). RLHF leverages datasets of human preferences over outputs of large language models (LLM)s to instill human expectations into LLMs. Although preference annotation comes with a monetized cost, the economic utility of a preference dataset has not been considered by far. What exacerbates this situation is that, given complex intransitive or cyclic relationships in preference datasets, existing algorithms for fine-tuning LLMs are still far from capturing comprehensive preferences. This raises severe cost-efficiency concerns in production environments, where preference data accumulate over time. In this paper, we discuss the fine-tuning of LLMs as a monetized economy and introduce an auction mechanism to improve the efficiency of preference data collection in dollar terms. We show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications