To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP
G\"ozde G\"ul \c{S}ahin

TL;DR
This study systematically compares various text augmentation techniques across multiple low-resource languages and NLP tasks, revealing their varying effectiveness and highlighting the importance of task, language, and model considerations.
Contribution
It provides a comprehensive analysis of syntax, token, and character-level augmentation methods for low-resource NLP tasks, filling a gap in systematic performance evaluation.
Findings
Character-level augmentation is most consistently effective.
Augmentation significantly improves dependency parsing performance.
Effectiveness varies by language, task, and model type.
Abstract
Data-hungry deep neural networks have established themselves as the standard for many NLP tasks including the traditional sequence tagging ones. Despite their state-of-the-art performance on high-resource languages, they still fall behind of their statistical counter-parts in low-resource scenarios. One methodology to counter attack this problem is text augmentation, i.e., generating new synthetic training data points from existing data. Although NLP has recently witnessed a load of textual augmentation techniques, the field still lacks a systematic performance analysis on a diverse set of languages and sequence tagging tasks. To fill this gap, we investigate three categories of text augmentation methodologies which perform changes on the syntax (e.g., cropping sub-sentences), token (e.g., random word insertion) and character (e.g., character swapping) levels. We systematically compare…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsmBERT
