Resource-Size matters: Improving Neural Named Entity Recognition with   Optimized Large Corpora

Sajawel Ahmed; Alexander Mehler

arXiv:1807.10675·cs.CL·July 30, 2018·1 cites

Resource-Size matters: Improving Neural Named Entity Recognition with Optimized Large Corpora

Sajawel Ahmed, Alexander Mehler

PDF

Open Access 1 Repo

TL;DR

This paper enhances neural named entity recognition for low-resource languages by optimizing large corpora with morphological processing, achieving significant performance gains and establishing new state-of-the-art results.

Contribution

It introduces a resource optimization approach with morphological preprocessing, outperforming existing models without designing deeper neural architectures.

Findings

01

Up to 11% F-score improvement on German NER

02

Optimized corpora significantly boost downstream task performance

03

Morphological processing enhances data quality for NER

Abstract

This study improves the performance of neural named entity recognition by a margin of up to 11% in F-score on the example of a low-resource language like German, thereby outperforming existing baselines and establishing a new state-of-the-art on each single open-source dataset. Rather than designing deeper and wider hybrid neural architectures, we gather all available resources and perform a detailed optimization and grammar-dependent morphological processing consisting of lemmatization and part-of-speech tagging prior to exposing the raw data to any training process. We test our approach in a threefold monolingual experimental setup of a) single, b) joint, and c) optimized training and shed light on the dependency of downstream-tasks on the size of corpora used to compute word embeddings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FID-Biodiversity/GermanWordEmbeddings-NER
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification