Guidance-Based Prompt Data Augmentation in Specialized Domains for Named Entity Recognition

Hyeonseok Kang; Hyein Seo; Jeesu Jung; Sangkeun Jung; Du-Seong Chang; Riwoo Chung

arXiv:2407.18442·cs.CL·February 6, 2026

Guidance-Based Prompt Data Augmentation in Specialized Domains for Named Entity Recognition

Hyeonseok Kang, Hyein Seo, Jeesu Jung, Sangkeun Jung, Du-Seong Chang, Riwoo Chung

PDF

Open Access 1 Video

TL;DR

This paper presents a guidance-based data augmentation method for named entity recognition in specialized domains, using abstracted context and sentence structures to generate diverse, high-quality training data that improves model performance.

Contribution

The study introduces a novel guidance technique leveraging abstracted context and sentence structures for effective data augmentation in NER tasks within specialized fields.

Findings

01

Enhanced diversity in entity vocabulary and sentence structures.

02

Improved training performance of NER models.

03

Addresses data scarcity in specialized domains.

Abstract

While the abundance of rich and vast datasets across numerous fields has facilitated the advancement of natural language processing, sectors in need of specialized data types continue to struggle with the challenge of finding quality data. Our study introduces a novel guidance data augmentation technique utilizing abstracted context and sentence structures to produce varied sentences while maintaining context-entity relationships, addressing data scarcity challenges. By fostering a closer relationship between context, sentence structure, and role of entities, our method enhances data augmentation's effectiveness. Consequently, by showcasing diversification in both entity-related vocabulary and overall sentence structure, and simultaneously improving the training performance of named entity recognition task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Guidance-Based Prompt Data Augmentation in Specialized Domains for Named Entity Recognition· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Service-Oriented Architecture and Web Services