Learning A Unified Named Entity Tagger From Multiple Partially Annotated   Corpora For Efficient Adaptation

Xiao Huang; Li Dong; Elizabeth Boschee; Nanyun Peng

arXiv:1909.11535·cs.CL·October 8, 2019

Learning A Unified Named Entity Tagger From Multiple Partially Annotated Corpora For Efficient Adaptation

Xiao Huang, Li Dong, Elizabeth Boschee, Nanyun Peng

PDF

Open Access 1 Repo

TL;DR

This paper introduces a deep structured model that integrates multiple partially annotated datasets for named entity recognition, enabling robust multi-type entity identification and outperforming existing multi-task baselines.

Contribution

The paper presents a novel deep structured model that effectively combines partially annotated datasets for joint entity recognition, improving adaptation and accuracy.

Findings

01

Significant performance improvement over multi-task baselines.

02

Effective integration of diverse entity types from multiple datasets.

03

Robust input representations learned from combined data.

Abstract

Named entity recognition (NER) identifies typed entity mentions in raw text. While the task is well-established, there is no universally used tagset: often, datasets are annotated for use in downstream applications and accordingly only cover a small set of entity types relevant to a particular task. For instance, in the biomedical domain, one corpus might annotate genes, another chemicals, and another diseases---despite the texts in each corpus containing references to all three types of entities. In this paper, we propose a deep structured model to integrate these "partially annotated" datasets to jointly identify all entity types appearing in the training corpora. By leveraging multiple datasets, the model can learn robust input representations; by building a joint structured model, it avoids potential conflicts caused by combining several models' predictions at test time. Experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xhuang28/NewBioNer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies