Data Augmentation for Cross-Domain Named Entity Recognition
Shuguang Chen, Gustavo Aguilar, Leonardo Neves, Thamar Solorio

TL;DR
This paper introduces a neural architecture for cross-domain data augmentation in NER, transforming high-resource domain data to improve low-resource domain performance, leading to significant accuracy gains.
Contribution
It proposes a novel neural method to project high-resource domain data into low-resource domains, enhancing NER models in low-resource settings.
Findings
Transforming high-resource data improves low-resource NER performance.
The approach achieves significant accuracy gains over traditional methods.
Experiments across diverse datasets validate the effectiveness of the proposed method.
Abstract
Current work in named entity recognition (NER) shows that data augmentation techniques can produce more robust models. However, most existing techniques focus on augmenting in-domain data in low-resource scenarios where annotated data is quite limited. In contrast, we study cross-domain data augmentation for the NER task. We investigate the possibility of leveraging data from high-resource domains by projecting it into the low-resource domains. Specifically, we propose a novel neural architecture to transform the data representation from a high-resource to a low-resource domain by learning the patterns (e.g. style, noise, abbreviations, etc.) in the text that differentiate them and a shared feature space where both domains are aligned. We experiment with diverse datasets and show that transforming the data to the low-resource domain representation achieves significant improvements over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
