Learning Dictionaries for Named Entity Recognition using Minimal Supervision
Arvind Neelakantan, Michael Collins

TL;DR
This paper presents a method for automatically building NER dictionaries from unlabeled data using minimal supervision, leveraging CCA for phrase embeddings and improving recognition accuracy.
Contribution
It introduces a novel approach combining CCA-based phrase embeddings with minimal labeled data for effective dictionary construction in NER.
Findings
Achieved 16.5% and 11.3% F-1 score improvements over co-training.
Adding phrase embeddings as features enhances sequence tagging performance.
Method requires minimal labeled data, reducing annotation effort.
Abstract
This paper describes an approach for automatic construction of dictionaries for Named Entity Recognition (NER) using large amounts of unlabeled data and a few seed examples. We use Canonical Correlation Analysis (CCA) to obtain lower dimensional embeddings (representations) for candidate phrases and classify these phrases using a small number of labeled examples. Our method achieves 16.5% and 11.3% F-1 score improvement over co-training on disease and virus NER respectively. We also show that by adding candidate phrase embeddings as features in a sequence tagger gives better performance compared to using word embeddings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
