Robust Lexical Features for Improved Neural Network Named-Entity Recognition
Abbas Ghaddar, Philippe Langlais

TL;DR
This paper demonstrates that incorporating lexical features embedded in a learned vector space significantly enhances neural network-based named-entity recognition, achieving new state-of-the-art results on benchmark datasets.
Contribution
The authors introduce a method to embed lexical features into a low-dimensional space trained on Wikipedia data, improving NER performance in neural networks.
Findings
Achieved a new state-of-the-art F1 score of 87.95 on ONTONOTES 5.0.
Matched the best performance with an F1 score of 91.73 on CONLL-2003.
Showed lexical features are more useful than previously considered in neural NER models.
Abstract
Neural network approaches to Named-Entity Recognition reduce the need for carefully hand-crafted features. While some features do remain in state-of-the-art systems, lexical features have been mostly discarded, with the exception of gazetteers. In this work, we show that this is unfair: lexical features are actually quite useful. We propose to embed words and entity types into a low-dimensional vector space we train from annotated data produced by distant supervision thanks to Wikipedia. From this, we compute - offline - a feature vector representing each word. When used with a vanilla recurrent neural network model, this representation yields substantial improvements. We establish a new state-of-the-art F1 score of 87.95 on ONTONOTES 5.0, while matching state-of-the-art performance with a F1 score of 91.73 on the over-studied CONLL-2003 dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
