Segmentation and Processing of German Court Decisions from Open Legal Data
Harshil Darji, Martin Heckelmann, Christina Kratsch, Gerard de Melo

TL;DR
This paper presents a new, cleaned, and sectioned dataset of over 250,000 German court decisions, with reliably extracted key sections to facilitate NLP research and legal analysis.
Contribution
It introduces a systematically processed dataset with reliably separated legal decision sections, verified through statistical sampling and manual validation.
Findings
Successfully extracted and verified key decision sections in a large dataset
Created a publicly available JSONL corpus for NLP and legal research
Enhanced data quality for downstream legal NLP tasks
Abstract
The availability of structured legal data is important for advancing Natural Language Processing (NLP) techniques for the German legal system. One of the most widely used datasets, Open Legal Data, provides a large-scale collection of German court decisions. While the metadata in this raw dataset is consistently structured, the decision texts themselves are inconsistently formatted and often lack clearly marked sections. Reliable separation of these sections is important not only for rhetorical role classification but also for downstream tasks such as retrieval and citation analysis. In this work, we introduce a cleaned and sectioned dataset of 251,038 German court decisions derived from the official Open Legal Data dataset. We systematically separated three important sections in German court decisions, namely Tenor (operative part of the decision), Tatbestand (facts of the case), and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Comparative and International Law Studies · Legal Language and Interpretation
