Confidence penalty, annealing Gaussian noise and zoneout for biLSTM-CRF   networks for named entity recognition

Antonio Jimeno Yepes

arXiv:1808.04029·cs.CL·August 15, 2018

Confidence penalty, annealing Gaussian noise and zoneout for biLSTM-CRF networks for named entity recognition

Antonio Jimeno Yepes

PDF

Open Access

TL;DR

This paper analyzes various regularization and optimization techniques such as confidence penalty, Gaussian noise annealing, and zoneout to enhance biLSTM-CRF networks for named entity recognition, achieving state-of-the-art results.

Contribution

It introduces and evaluates the effectiveness of several optimization methods to improve biLSTM-CRF NER performance, setting new benchmarks.

Findings

01

Optimization methods improve NER accuracy

02

Achieved new state-of-the-art F1 score of 87.18 on CoNLL-2003 Spanish

03

Regularization techniques help prevent overfitting

Abstract

Named entity recognition (NER) is used to identify relevant entities in text. A bidirectional LSTM (long short term memory) encoder with a neural conditional random fields (CRF) decoder (biLSTM-CRF) is the state of the art methodology. In this work, we have done an analysis of several methods that intend to optimize the performance of networks based on this architecture, which in some cases encourage overfitting avoidance. These methods target exploration of parameter space, regularization of LSTMs and penalization of confident output distributions. Results show that the optimization methods improve the performance of the biLSTM-CRF NER baseline system, setting a new state of the art performance for the CoNLL-2003 Spanish set with an F1 of 87.18.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Healthcare

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory