TENER: Adapting Transformer Encoder for Named Entity Recognition
Hang Yan, Bocao Deng, Xiaonan Li, Xipeng Qiu

TL;DR
This paper introduces TENER, an adapted Transformer encoder for NER that models character and word features, demonstrating its effectiveness comparable to other NLP tasks.
Contribution
It proposes a novel Transformer-based architecture specifically adapted for NER, incorporating direction and relative distance aware attention mechanisms.
Findings
Transformer encoder is effective for NER tasks.
Incorporating direction and relative distance improves performance.
TENER achieves competitive results on NER benchmarks.
Abstract
The Bidirectional long short-term memory networks (BiLSTM) have been widely used as an encoder in models solving the named entity recognition (NER) task. Recently, the Transformer is broadly adopted in various Natural Language Processing (NLP) tasks owing to its parallelism and advantageous performance. Nevertheless, the performance of the Transformer in NER is not as good as it is in other NLP tasks. In this paper, we propose TENER, a NER architecture adopting adapted Transformer Encoder to model the character-level features and word-level features. By incorporating the direction and relative distance aware attention and the un-scaled attention, we prove the Transformer-like encoder is just as effective for NER as other NLP tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
