Exploring Data Augmentation Methods on Social Media Corpora
Isabel Garcia Pietri, Kineret Stanley

TL;DR
This paper evaluates various data augmentation techniques for NLP text classification on social media data, finding limited but promising improvements with certain methods and highlighting areas for future research.
Contribution
It systematically compares popular and novel data augmentation methods in NLP, including a new technique called Greyscaling, using social media datasets and BERT.
Findings
Synonym replacement shows some performance gains.
Greyscaling warrants further investigation.
Few-shot learning consistently improves results.
Abstract
Data augmentation has proven widely effective in computer vision. In Natural Language Processing (NLP) data augmentation remains an area of active research. There is no widely accepted augmentation technique that works well across tasks and model architectures. In this paper we explore data augmentation techniques in the context of text classification using two social media datasets. We explore popular varieties of data augmentation, starting with oversampling, Easy Data Augmentation (Wei and Zou, 2019) and Back-Translation (Sennrich et al., 2015). We also consider Greyscaling, a relatively unexplored data augmentation technique that seeks to mitigate the intensity of adjectives in examples. Finally, we consider a few-shot learning approach: Pattern-Exploiting Training (PET) (Schick et al., 2020). For the experiments we use a BERT transformer architecture. Results show that augmentation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsMulti-Head Attention · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Attention Is All You Need · Adam · Dropout · Softmax · Dense Connections · Weight Decay
