Generating Training Data with Language Models: Towards Zero-Shot Language Understanding
Yu Meng, Jiaxin Huang, Yu Zhang, Jiawei Han

TL;DR
This paper introduces a zero-shot learning approach for natural language understanding by generating training data with language models, achieving competitive results on the GLUE benchmark without task-specific data.
Contribution
The authors propose a novel method combining unidirectional and bidirectional PLMs to generate training data for zero-shot NLU, improving performance over existing prompting methods.
Findings
Achieves strong zero-shot performance on GLUE tasks
Outperforms zero-shot prompting methods significantly
Comparable to few-shot approaches with limited data
Abstract
Pretrained language models (PLMs) have demonstrated remarkable performance in various natural language processing tasks: Unidirectional PLMs (e.g., GPT) are well known for their superior text generation capabilities; bidirectional PLMs (e.g., BERT) have been the prominent choice for natural language understanding (NLU) tasks. While both types of models have achieved promising few-shot learning performance, their potential for zero-shot learning has been underexplored. In this paper, we present a simple approach that uses both types of PLMs for fully zero-shot learning of NLU tasks without requiring any task-specific data: A unidirectional PLM generates class-conditioned texts guided by prompts, which are used as the training data for fine-tuning a bidirectional PLM. With quality training data selected based on the generation probability and regularization techniques (label smoothing and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
