VickreyFeedback: Cost-efficient Data Construction for Reinforcement Learning from Human Feedback
Guoxi Zhang, Jiuding Duan

TL;DR
This paper proposes an auction-based mechanism to improve the cost-efficiency of collecting human preference data for fine-tuning large language models, maintaining performance while reducing costs.
Contribution
It introduces an auction mechanism for preference data collection in RLHF, addressing cost-efficiency and complex preference relationships.
Findings
Auction mechanism improves cost-efficiency in preference data collection.
Maintains satisfactory model performance with reduced costs.
Effective for high-quality feedback in LLM fine-tuning.
Abstract
This paper addresses the cost-efficiency aspect of Reinforcement Learning from Human Feedback (RLHF). RLHF leverages datasets of human preferences over outputs of large language models (LLM)s to instill human expectations into LLMs. Although preference annotation comes with a monetized cost, the economic utility of a preference dataset has not been considered by far. What exacerbates this situation is that, given complex intransitive or cyclic relationships in preference datasets, existing algorithms for fine-tuning LLMs are still far from capturing comprehensive preferences. This raises severe cost-efficiency concerns in production environments, where preference data accumulate over time. In this paper, we discuss the fine-tuning of LLMs as a monetized economy and introduce an auction mechanism to improve the efficiency of preference data collection in dollar terms. We show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications
