Not Enough Data? Deep Learning to the Rescue!
Ateret Anaby-Tavor, Boaz Carmeli, Esther Goldbraich, Amir Kantor,, George Kour, Segev Shlomov, Naama Tepper, and Naama Zwerdling

TL;DR
This paper introduces LAMBADA, a novel data augmentation technique using fine-tuned language models to generate synthetic labeled data, significantly enhancing text classification performance especially with limited data.
Contribution
The paper presents LAMBADA, a new method leveraging pre-trained language models for effective data augmentation in low-resource text classification tasks.
Findings
LAMBADA improves classifier accuracy across multiple datasets.
It outperforms existing data augmentation methods.
Significantly benefits scenarios with scarce labeled data.
Abstract
Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Anomaly Detection Techniques and Applications
