A Dataset of German Legal Documents for Named Entity Recognition

Elena Leitner; Georg Rehm; Juli\'an Moreno-Schneider

arXiv:2003.13016·cs.CL·March 31, 2020·27 cites

A Dataset of German Legal Documents for Named Entity Recognition

Elena Leitner, Georg Rehm, Juli\'an Moreno-Schneider

PDF

Open Access 1 Repo 2 Models 2 Datasets

TL;DR

This paper introduces a comprehensive German legal document dataset with extensive manual and automatic annotations, designed to advance Named Entity Recognition in legal texts.

Contribution

It provides a large, richly annotated dataset specifically tailored for NER in German legal documents, filling a gap in resources for this domain.

Findings

01

Dataset contains 67,000 sentences and 2 million tokens.

02

Includes 54,000 manually annotated entities across 19 classes.

03

Annotated with over 35,000 time expressions using TimeML.

Abstract

We describe a dataset developed for Named Entity Recognition in German federal court decisions. It consists of approx. 67,000 sentences with over 2 million tokens. The resource contains 54,000 manually annotated entities, mapped to 19 fine-grained semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European legal norm, regulation, contract, court decision, and legal literature. The legal documents were, furthermore, automatically annotated with more than 35,000 TimeML-based time expressions. The dataset, which is available under a CC-BY 4.0 license in the CoNNL-2002 format, was developed for training an NER service for German legal documents in the EU project Lynx.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

elenanereiss/Legal-Entity-Recognition
tfOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Law