Random Text Perturbations Work, but not Always
Zhengxiang Wang

TL;DR
This study evaluates the effectiveness of random text perturbations as data augmentation in NLP, revealing that their impact varies depending on training data size and task specifics, and they are not universally beneficial.
Contribution
The paper provides large-scale empirical evidence that random text perturbations have task-dependent effects and are not always effective for NLP classification tasks.
Findings
Perturbations can both improve and degrade performance.
Effectiveness depends on training data size.
Impact varies across different models and tasks.
Abstract
We present three large-scale experiments on binary text matching classification task both in Chinese and English to evaluate the effectiveness and generalizability of random text perturbations as a data augmentation approach for NLP. It is found that the augmentation can bring both negative and positive effects to the test set performance of three neural classification models, depending on whether the models train on enough original training examples. This remains true no matter whether five random text editing operations, used to augment text, are applied together or separately. Our study demonstrates with strong implication that the effectiveness of random text perturbations is task specific and not generally positive.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
MethodsTest
