Chunking German Legal Code

Max Prior; Natalia Milanova; Andreas Schultz

arXiv:2605.19806·cs.CL·May 20, 2026

Chunking German Legal Code

Max Prior, Natalia Milanova, Andreas Schultz

PDF

TL;DR

This study evaluates various chunking strategies for legal document retrieval in German law, finding that structure-aligned methods outperform complex semantic approaches in recall and efficiency.

Contribution

It systematically compares multiple chunking approaches for legal retrieval, emphasizing the importance of domain-specific structure preservation.

Findings

01

Structure-aligned chunking achieves highest recall.

02

Simpler methods are more computationally efficient.

03

Complex semantic methods underperform compared to structural approaches.

Abstract

This paper investigates chunking strategies for retrieval-augmented generation on German statutory law, using the German Civil Code as a structured benchmark corpus. We implement and compare a range of segmentation approaches, including structural units (sections, subsections, sentences, propositions), fixed-size windows, contextual chunking, semantic clustering, Lumber-style chunking, and RAPTOR-based hierarchical retrieval. All methods are evaluated on a legal question-answering dataset with section-level gold labels, measuring recall, query latency, index build time, and storage requirements. Results show that chunking strategies aligned with the inherent legal structure - particularly section and subsection - based retrieval-achieve the highest recall, while more complex approaches that override this structure perform worse. These simpler methods also offer favorable computational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.