Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs   with Nothing

Zhangchen Xu; Fengqing Jiang; Luyao Niu; Yuntian Deng; Radha; Poovendran; Yejin Choi; Bill Yuchen Lin

arXiv:2406.08464·cs.CL·October 8, 2024·6 cites

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Zhangchen Xu, Fengqing Jiang, Luyao Niu, Yuntian Deng, Radha, Poovendran, Yejin Choi, Bill Yuchen Lin

PDF

Open Access 2 Repos 10 Models 5 Datasets

TL;DR

Magpie is a novel method that synthesizes high-quality instruction data from aligned LLMs like Llama-3-Instruct by prompting them to generate large-scale datasets, reducing reliance on human labor and enhancing dataset diversity.

Contribution

We introduce Magpie, a self-synthesis approach that extracts instruction-response pairs directly from aligned LLMs, enabling scalable and high-quality dataset creation without additional human annotation.

Findings

01

Magpie generated 4 million instruction-response pairs, with 300K high-quality instances.

02

Models fine-tuned with Magpie data perform comparably to those trained on official instruction datasets.

03

Using Magpie data alone can outperform previous public datasets in alignment benchmarks.

Abstract

High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the diversity and quality of public alignment datasets. Is it possible to synthesize high-quality instruction data at scale by extracting it directly from an aligned LLM? We present a self-synthesis method for generating large-scale alignment data named Magpie. Our key observation is that aligned LLMs like Llama-3-Instruct can generate a user query when we input only the left-side templates up to the position reserved for user messages, thanks to their auto-regressive nature. We use this method to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Handwritten Text Recognition Techniques · Mathematics, Computing, and Information Processing

MethodsShrink and Fine-Tune