ZeroGen: Efficient Zero-shot Learning via Dataset Generation

Jiacheng Ye; Jiahui Gao; Qintong Li; Hang Xu; Jiangtao Feng; Zhiyong; Wu; Tao Yu; Lingpeng Kong

arXiv:2202.07922·cs.CL·October 25, 2022·6 cites

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

Jiacheng Ye, Jiahui Gao, Qintong Li, Hang Xu, Jiangtao Feng, Zhiyong, Wu, Tao Yu, Lingpeng Kong

PDF

Open Access 3 Repos

TL;DR

ZeroGen introduces an efficient zero-shot learning approach that generates datasets with large language models, enabling training small task-specific models with competitive performance across various NLP tasks.

Contribution

The paper proposes ZeroGen, a novel method that creates datasets from scratch using PLMs for zero-shot learning, reducing model size and inference cost.

Findings

01

Effective across multiple NLP tasks

02

Reduces inference complexity significantly

03

Provides insights into data-free knowledge distillation

Abstract

There is a growing interest in dataset generation recently due to the superior generative capacity of large pre-trained language models (PLMs). In this paper, we study a flexible and efficient zero-short learning method, \textsc{ZeroGen}. Given a zero-shot task, we first generate a dataset from scratch using PLMs in an unsupervised manner. Then, we train a tiny task model (e.g., LSTM) under the supervision of the synthesized dataset. This approach allows highly efficient inference as the final task model only has orders of magnitude fewer parameters comparing to PLMs (e.g., GPT2-XL). Apart from being annotation-free and efficient, we argue that \textsc{ZeroGen} can also provide useful insights from the perspective of data-free model-agnostic knowledge distillation, and unreferenced text generation evaluation. Experiments and analysis on different NLP tasks, namely, text classification,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications