Learning A Unified Named Entity Tagger From Multiple Partially Annotated Corpora For Efficient Adaptation
Xiao Huang, Li Dong, Elizabeth Boschee, Nanyun Peng

TL;DR
This paper introduces a deep structured model that integrates multiple partially annotated datasets for named entity recognition, enabling robust multi-type entity identification and outperforming existing multi-task baselines.
Contribution
The paper presents a novel deep structured model that effectively combines partially annotated datasets for joint entity recognition, improving adaptation and accuracy.
Findings
Significant performance improvement over multi-task baselines.
Effective integration of diverse entity types from multiple datasets.
Robust input representations learned from combined data.
Abstract
Named entity recognition (NER) identifies typed entity mentions in raw text. While the task is well-established, there is no universally used tagset: often, datasets are annotated for use in downstream applications and accordingly only cover a small set of entity types relevant to a particular task. For instance, in the biomedical domain, one corpus might annotate genes, another chemicals, and another diseases---despite the texts in each corpus containing references to all three types of entities. In this paper, we propose a deep structured model to integrate these "partially annotated" datasets to jointly identify all entity types appearing in the training corpora. By leveraging multiple datasets, the model can learn robust input representations; by building a joint structured model, it avoids potential conflicts caused by combining several models' predictions at test time. Experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
