AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois, Xuechen Li, Rohan Taori, Tianyi Zhang, Ishaan Gulrajani,, Jimmy Ba, Carlos Guestrin, Percy Liang, Tatsunori B. Hashimoto

TL;DR
AlpacaFarm is a low-cost simulation framework for developing and evaluating methods that learn from human feedback in training large language models, enabling efficient research and benchmarking.
Contribution
The paper introduces AlpacaFarm, a simulation environment that reduces data collection costs, provides trustworthy automatic evaluation, and offers reference implementations for feedback-based learning methods.
Findings
Reward model methods outperform supervised fine-tuning.
PPO implementation achieves +10% win-rate against Davinci003.
Model rankings in AlpacaFarm align with real human feedback.
Abstract
Large language models (LLMs) such as ChatGPT have seen widespread adoption due to their strong instruction-following abilities. Developing these LLMs involves a complex yet poorly understood workflow requiring training with human feedback. Replicating and understanding this instruction-following requires tackling three major challenges: the high cost of data collection, the lack of trustworthy evaluation, and the absence of reference method implementations. We address these challenges with AlpacaFarm, a simulator that enables research and development for learning from feedback at a low cost. First, we design LLM prompts to simulate human feedback that are 50x cheaper than crowdworkers and display high agreement with humans. Second, we propose an automatic evaluation and validate it against human instructions obtained on real-world interactions. Third, we contribute reference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education
MethodsEntropy Regularization · Proximal Policy Optimization
