Simple Questions Generate Named Entity Recognition Datasets
Hyunjae Kim, Jaehyo Yoo, Seunghyun Yoon, Jinhyuk Lee, Jaewoo Kang

TL;DR
This paper presents an automated method to generate NER datasets using simple questions and an open-domain QA system, significantly reducing reliance on human annotation and improving performance in low-resource settings.
Contribution
It introduces an ask-to-generate approach for creating NER datasets that outperforms low-resource models and rivals rich-resource models without extensive domain-specific resources.
Findings
Models trained on generated datasets outperform low-resource baselines by 19.4 F1 points.
Achieves state-of-the-art results in few-shot NER with a 5.2 F1 point improvement.
Competitive with models using in-domain dictionaries, despite fewer resources.
Abstract
Recent named entity recognition (NER) models often rely on human-annotated datasets, requiring the significant engagement of professional knowledge on the target domain and entities. This research introduces an ask-to-generate approach that automatically generates NER datasets by asking questions in simple natural language to an open-domain question answering system (e.g., "Which disease?"). Despite using fewer in-domain resources, our models, solely trained on the generated datasets, largely outperform strong low-resource models by an average F1 score of 19.4 for six popular NER benchmarks. Furthermore, our models provide competitive performance with rich-resource models that additionally leverage in-domain dictionaries provided by domain experts. In few-shot NER, we outperform the previous best model by an F1 score of 5.2 on three benchmarks and achieve new state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Interpreting and Communication in Healthcare
