Style Transfer as Data Augmentation: A Case Study on Named Entity   Recognition

Shuguang Chen; Leonardo Neves; Thamar Solorio

arXiv:2210.07916·cs.CL·October 17, 2022

Style Transfer as Data Augmentation: A Case Study on Named Entity Recognition

Shuguang Chen, Leonardo Neves, Thamar Solorio

PDF

Open Access 1 Repo

TL;DR

This paper introduces a style transfer-based data augmentation method for named entity recognition, effectively increasing data diversity and improving performance in low-resource settings.

Contribution

It proposes a novel style transfer technique with constrained decoding and data selection to generate valid synthetic data for NER in low-resource domains.

Findings

01

Significant performance improvements over existing augmentation methods.

02

Effective in various domain pairs and data regimes.

03

Practical approach applicable to other NLP tasks.

Abstract

In this work, we take the named entity recognition task in the English language as a case study and explore style transfer as a data augmentation method to increase the size and diversity of training data in low-resource scenarios. We propose a new method to effectively transform the text from a high-resource domain to a low-resource domain by changing its style-related attributes to generate synthetic data for training. Moreover, we design a constrained decoding algorithm along with a set of key ingredients for data selection to guarantee the generation of valid and coherent data. Experiments and analysis on five different domain pairs under different data regimes demonstrate that our approach can significantly improve results compared to current state-of-the-art data augmentation methods. Our approach is a practical solution to data scarcity, and we expect it to be applicable to other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ritual-uh/da_ner
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies