Deep Span Representations for Named Entity Recognition
Enwei Zhu, Yiyang Liu, Jinpeng Li

TL;DR
This paper introduces DSpERT, a deep span encoder that enhances span representations for NER, especially for long and nested entities, by stacking transformer layers to improve semantic depth and separation.
Contribution
The paper proposes DSpERT, a novel span transformer architecture that produces deep semantic span representations, outperforming existing shallow models in NER tasks.
Findings
DSpERT achieves state-of-the-art or competitive results on eight NER benchmarks.
Deep span representations improve performance on long-span and nested entities.
Deep span features are well-structured and easily separable.
Abstract
Span-based models are one of the most straightforward methods for named entity recognition (NER). Existing span-based NER systems shallowly aggregate the token representations to span representations. However, this typically results in significant ineffectiveness for long-span entities, a coupling between the representations of overlapping spans, and ultimately a performance degradation. In this study, we propose DSpERT (Deep Span Encoder Representations from Transformers), which comprises a standard Transformer and a span Transformer. The latter uses low-layered span representations as queries, and aggregates the token representations as keys and values, layer by layer from bottom to top. Thus, DSpERT produces span representations of deep semantics. With weight initialization from pretrained language models, DSpERT achieves performance higher than or competitive with recent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Residual Connection · Dropout · Softmax · Label Smoothing · Multi-Head Attention · Adam · Dense Connections
