Evaluating Language Models as Synthetic Data Generators

Seungone Kim; Juyoung Suk; Xiang Yue; Vijay Viswanathan; Seongyun Lee; Yizhong Wang; Kiril Gashteovski; Carolin Lawrence; Sean Welleck; Graham Neubig

arXiv:2412.03679·cs.CL·September 3, 2025·3 cites

Evaluating Language Models as Synthetic Data Generators

Seungone Kim, Juyoung Suk, Xiang Yue, Vijay Viswanathan, Seongyun Lee, Yizhong Wang, Kiril Gashteovski, Carolin Lawrence, Sean Welleck, Graham Neubig

PDF

Open Access 2 Repos

TL;DR

This paper introduces AgoraBench, a standardized benchmark for evaluating language models' synthetic data generation abilities, revealing distinct strengths among models and key data quality indicators.

Contribution

It provides a unified framework and metrics for systematically comparing LMs as data generators, filling a gap in prior research.

Findings

01

GPT-4o excels at generating new problems

02

Claude-3.5-Sonnet improves existing problems

03

Data generation ability is not directly correlated with problem-solving ability

Abstract

Given the increasing use of synthetic data in language model (LM) post-training, an LM's ability to generate high-quality data has become nearly as crucial as its ability to solve problems directly. While prior works have focused on developing effective data generation methods, they lack systematic comparison of different LMs as data generators in a unified setting. To address this gap, we propose AgoraBench, a benchmark that provides standardized settings and metrics to evaluate LMs' data generation abilities. Through synthesizing 1.26 million training instances using 6 LMs and training 99 student models, we uncover key insights about LMs' data generation capabilities. First, we observe that LMs exhibit distinct strengths. For instance, GPT-4o excels at generating new problems, while Claude-3.5-Sonnet performs better at enhancing existing ones. Furthermore, our analysis reveals that an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling