NorNE: Annotating Named Entities for Norwegian
Fredrik J{\o}rgensen, Tobias Aasmoe, Anne-Stine Ruud Husev{\aa}g,, Lilja {\O}vrelid, Erik Velldal

TL;DR
This paper introduces NorNE, a comprehensive manually annotated corpus of Norwegian named entities covering Bokmål and Nynorsk, designed to support NLP tasks with detailed entity annotations and an analysis of annotation quality and neural model performance.
Contribution
The paper presents NorNE, the first extensive Norwegian named entity corpus with detailed annotations for multiple entity types and an evaluation of neural sequence labeling methods.
Findings
High inter-annotator agreement achieved
Effective neural models demonstrated on the corpus
Rich set of entity annotations enhances Norwegian NLP resources
Abstract
This paper presents NorNE, a manually annotated corpus of named entities which extends the annotation of the existing Norwegian Dependency Treebank. Comprising both of the official standards of written Norwegian (Bokm{\aa}l and Nynorsk), the corpus contains around 600,000 tokens and annotates a rich set of entity types including persons, organizations, locations, geo-political entities, products, and events, in addition to a class corresponding to nominals derived from names. We here present details on the annotation effort, guidelines, inter-annotator agreement and an experimental analysis of the corpus using a neural sequence labeling architecture.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies
