Random Text Perturbations Work, but not Always

Zhengxiang Wang

arXiv:2209.00797·cs.CL·October 4, 2022

Random Text Perturbations Work, but not Always

Zhengxiang Wang

PDF

Open Access 1 Repo

TL;DR

This study evaluates the effectiveness of random text perturbations as data augmentation in NLP, revealing that their impact varies depending on training data size and task specifics, and they are not universally beneficial.

Contribution

The paper provides large-scale empirical evidence that random text perturbations have task-dependent effects and are not always effective for NLP classification tasks.

Findings

01

Perturbations can both improve and degrade performance.

02

Effectiveness depends on training data size.

03

Impact varies across different models and tasks.

Abstract

We present three large-scale experiments on binary text matching classification task both in Chinese and English to evaluate the effectiveness and generalizability of random text perturbations as a data augmentation approach for NLP. It is found that the augmentation can bring both negative and positive effects to the test set performance of three neural classification models, depending on whether the models train on enough original training examples. This remains true no matter whether five random text editing operations, used to augment text, are applied together or separately. Our study demonstrates with strong implication that the effectiveness of random text perturbations is task specific and not generally positive.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jaaack-wang/reda
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies

MethodsTest