Learning Dictionaries for Named Entity Recognition using Minimal   Supervision

Arvind Neelakantan; Michael Collins

arXiv:1504.06650·cs.CL·April 28, 2015

Learning Dictionaries for Named Entity Recognition using Minimal Supervision

Arvind Neelakantan, Michael Collins

PDF

TL;DR

This paper presents a method for automatically building NER dictionaries from unlabeled data using minimal supervision, leveraging CCA for phrase embeddings and improving recognition accuracy.

Contribution

It introduces a novel approach combining CCA-based phrase embeddings with minimal labeled data for effective dictionary construction in NER.

Findings

01

Achieved 16.5% and 11.3% F-1 score improvements over co-training.

02

Adding phrase embeddings as features enhances sequence tagging performance.

03

Method requires minimal labeled data, reducing annotation effort.

Abstract

This paper describes an approach for automatic construction of dictionaries for Named Entity Recognition (NER) using large amounts of unlabeled data and a few seed examples. We use Canonical Correlation Analysis (CCA) to obtain lower dimensional embeddings (representations) for candidate phrases and classify these phrases using a small number of labeled examples. Our method achieves 16.5% and 11.3% F-1 score improvement over co-training on disease and virus NER respectively. We also show that by adding candidate phrase embeddings as features in a sequence tagger gives better performance compared to using word embeddings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.