Application of Pre-training Models in Named Entity Recognition

Yu Wang; Yining Sun; Zuchang Ma; Lisheng Gao; Yang Xu; Ting Sun

arXiv:2002.08902·cs.CL·February 21, 2020·5 cites

Application of Pre-training Models in Named Entity Recognition

Yu Wang, Yining Sun, Zuchang Ma, Lisheng Gao, Yang Xu, Ting Sun

PDF

Open Access

TL;DR

This paper reviews pre-training models like BERT, ERNIE, and RoBERTa, and evaluates their effectiveness in Named Entity Recognition, demonstrating RoBERTa's superior performance on a benchmark dataset.

Contribution

It introduces the architectures and pre-training tasks of four models and compares their NER performance through fine-tuning experiments.

Findings

01

RoBERTa achieved state-of-the-art results on MSRA-2006 dataset.

02

Pre-training models significantly improve NER performance.

03

Different model architectures and pre-training tasks impact NER effectiveness.

Abstract

Named Entity Recognition (NER) is a fundamental Natural Language Processing (NLP) task to extract entities from unstructured data. The previous methods for NER were based on machine learning or deep learning. Recently, pre-training models have significantly improved performance on multiple NLP tasks. In this paper, firstly, we introduce the architecture and pre-training tasks of four common pre-training models: BERT, ERNIE, ERNIE2.0-tiny, and RoBERTa. Then, we apply these pre-training models to a NER task by fine-tuning, and compare the effects of the different model architecture and pre-training tasks on the NER task. The experiment results showed that RoBERTa achieved state-of-the-art results on the MSRA-2006 dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies

MethodsERNIE · Linear Layer · Adam · Softmax · Layer Normalization · Dropout · Attention Is All You Need · Multi-Head Attention · Residual Connection · Attention Dropout