Low Resource Text Classification with ULMFit and Backtranslation
Sam Shleifer

TL;DR
This paper demonstrates that backtranslation significantly enhances low-resource text classification performance with ULMFit, while random token perturbations do not, and explores test-time augmentation and ensembling for further gains.
Contribution
It introduces the use of backtranslation as an effective data augmentation technique for low-resource text classification with ULMFit, outperforming other methods.
Findings
Backtranslation improves accuracy in low-resource settings.
Random token perturbations do not improve performance.
Ensembling and test-time augmentation yield small additional gains.
Abstract
In computer vision, virtually every state-of-the-art deep learning system is trained with data augmentation. In text classification, however, data augmentation is less widely practiced because it must be performed before training and risks introducing label noise. We augment the IMDB movie reviews dataset with examples generated by two families of techniques: random token perturbations introduced by Wei and Zou [2019] and backtranslation -- translating to a second language then back to English. In low resource environments, backtranslation generates significant improvement on top of the state of-the-art ULMFit model. A ULMFit model pretrained on wikitext103 and then fine-tuned on only 50 IMDB examples and 500 synthetic examples generated by backtranslation achieves 80.6% accuracy, an 8.1% improvement over the augmentation-free baseline with only 9 minutes of additional training time.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsDropout · Sigmoid Activation · Tanh Activation · Temporal Activation Regularization · DropConnect · Long Short-Term Memory · Activation Regularization · Discriminative Fine-Tuning · Embedding Dropout · Variational Dropout
