Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena
Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Qingwei Lin, Jianguang, Lou, Shifeng Chen, Yansong Tang, Weizhu Chen

TL;DR
Arena Learning introduces an offline, AI-driven simulation method for evaluating and improving large language models through a data flywheel, reducing reliance on costly human annotations and enabling continuous post-training enhancements.
Contribution
The paper presents Arena Learning, a novel offline simulation approach that accurately predicts online battle outcomes and iteratively improves LLMs via a data flywheel mechanism.
Findings
WizardArena predictions align closely with online Arena results.
Significant performance improvements in WizardLM-β after applying Arena Learning.
Automated pipeline enables continuous LLM development without extensive human annotation.
Abstract
Assessing the effectiveness of large language models (LLMs) presents substantial challenges. The method of conducting human-annotated battles in an online Chatbot Arena is a highly effective evaluative technique. However, this approach is limited by the costs and time required for human annotation. In this paper, we introduce Arena Learning, an innovative offline strategy designed to simulate these arena battles using AI-driven annotations to evaluate battle outcomes, thus facilitating the continuous improvement of the target model through both supervised fine-tuning and reinforcement learning. Arena Learning comprises two key elements. First, it ensures precise evaluations and maintains consistency between offline simulations and online competitions via WizardArena, a pipeline developed to accurately predict the Elo rankings of various models using a meticulously designed offline test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques
MethodsALIGN
