DaN+: Danish Nested Named Entities and Lexical Normalization
Barbara Plank, Kristian N{\o}rgaard Jensen, Rob van der Goot

TL;DR
This paper presents DaN+, a Danish corpus with nested named entities and lexical normalization, evaluating cross-lingual transfer, domain adaptation, and the impact of lexical normalization on NER performance.
Contribution
It introduces a new Danish multi-domain corpus and guidelines, and empirically compares transfer strategies, BERT models, and normalization effects for nested NER in a low-resource language.
Findings
Multi-task learning is the most robust NER strategy.
BERT models are sensitive to domain shifts.
Lexical normalization benefits low-canonical data.
Abstract
This paper introduces DaN+, a new multi-domain corpus and annotation guidelines for Danish nested named entities (NEs) and lexical normalization to support research on cross-lingual cross-domain learning for a less-resourced language. We empirically assess three strategies to model the two-layer Named Entity Recognition (NER) task. We compare transfer capabilities from German versus in-language annotation from scratch. We examine language-specific versus multilingual BERT, and study the effect of lexical normalization on NER. Our results show that 1) the most robust strategy is multi-task learning which is rivaled by multi-label decoding, 2) BERT-based NER models are sensitive to domain shifts, and 3) in-language BERT and lexical normalization are the most beneficial on the least canonical data. Our results also show that an out-of-domain setup remains challenging, while performance on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsMulti-Head Attention · Linear Layer · Linear Warmup With Linear Decay · WordPiece · Layer Normalization · Attention Dropout · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Attention Is All You Need
