Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated   Chatbot Arena

Haipeng Luo; Qingfeng Sun; Can Xu; Pu Zhao; Qingwei Lin; Jianguang; Lou; Shifeng Chen; Yansong Tang; Weizhu Chen

arXiv:2407.10627·cs.CL·July 16, 2024·1 cites

Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena

Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Qingwei Lin, Jianguang, Lou, Shifeng Chen, Yansong Tang, Weizhu Chen

PDF

Open Access

TL;DR

Arena Learning introduces an offline, AI-driven simulation method for evaluating and improving large language models through a data flywheel, reducing reliance on costly human annotations and enabling continuous post-training enhancements.

Contribution

The paper presents Arena Learning, a novel offline simulation approach that accurately predicts online battle outcomes and iteratively improves LLMs via a data flywheel mechanism.

Findings

01

WizardArena predictions align closely with online Arena results.

02

Significant performance improvements in WizardLM-β after applying Arena Learning.

03

Automated pipeline enables continuous LLM development without extensive human annotation.

Abstract

Assessing the effectiveness of large language models (LLMs) presents substantial challenges. The method of conducting human-annotated battles in an online Chatbot Arena is a highly effective evaluative technique. However, this approach is limited by the costs and time required for human annotation. In this paper, we introduce Arena Learning, an innovative offline strategy designed to simulate these arena battles using AI-driven annotations to evaluate battle outcomes, thus facilitating the continuous improvement of the target model through both supervised fine-tuning and reinforcement learning. Arena Learning comprises two key elements. First, it ensures precise evaluations and maintains consistency between offline simulations and online competitions via WizardArena, a pipeline developed to accurately predict the Elo rankings of various models using a meticulously designed offline test…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques

MethodsALIGN