TENER: Adapting Transformer Encoder for Named Entity Recognition

Hang Yan; Bocao Deng; Xiaonan Li; Xipeng Qiu

arXiv:1911.04474·cs.CL·December 11, 2019·245 cites

TENER: Adapting Transformer Encoder for Named Entity Recognition

Hang Yan, Bocao Deng, Xiaonan Li, Xipeng Qiu

PDF

Open Access 5 Repos

TL;DR

This paper introduces TENER, an adapted Transformer encoder for NER that models character and word features, demonstrating its effectiveness comparable to other NLP tasks.

Contribution

It proposes a novel Transformer-based architecture specifically adapted for NER, incorporating direction and relative distance aware attention mechanisms.

Findings

01

Transformer encoder is effective for NER tasks.

02

Incorporating direction and relative distance improves performance.

03

TENER achieves competitive results on NER benchmarks.

Abstract

The Bidirectional long short-term memory networks (BiLSTM) have been widely used as an encoder in models solving the named entity recognition (NER) task. Recently, the Transformer is broadly adopted in various Natural Language Processing (NLP) tasks owing to its parallelism and advantageous performance. Nevertheless, the performance of the Transformer in NER is not as good as it is in other NLP tasks. In this paper, we propose TENER, a NER architecture adopting adapted Transformer Encoder to model the character-level features and word-level features. By incorporating the direction and relative distance aware attention and the un-scaled attention, we prove the Transformer-like encoder is just as effective for NER as other NLP tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax