MobIE: A German Dataset for Named Entity Recognition, Entity Linking and Relation Extraction in the Mobility Domain
Leonhard Hennig, Phuc Tran Truong, Aleksandra Gabryszak

TL;DR
MobIE is a comprehensive German dataset for named entity recognition, entity linking, and relation extraction in the mobility domain, enabling joint learning for these tasks.
Contribution
First German dataset combining annotations for NER, EL, and RE, with human and weakly-supervised annotations for mobility-related texts.
Findings
Contains 20.5K annotated entities with linking to a knowledge base
Includes a subset with human-annotated n-ary relations
Supports joint and multi-task learning approaches
Abstract
We present MobIE, a German-language dataset, which is human-annotated with 20 coarse- and fine-grained entity types and entity linking information for geographically linkable entities. The dataset consists of 3,232 social media texts and traffic reports with 91K tokens, and contains 20.5K annotated entities, 13.1K of which are linked to a knowledge base. A subset of the dataset is human-annotated with seven mobility-related, n-ary relation types, while the remaining documents are annotated using a weakly-supervised labeling approach implemented with the Snorkel framework. To the best of our knowledge, this is the first German-language dataset that combines annotations for NER, EL and RE, and thus can be used for joint and multi-task learning of these fundamental information extraction tasks. We make MobIE public at https://github.com/dfki-nlp/mobie.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Web Data Mining and Analysis
