Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Free

Li Zhang; Jaromir Savelka; Kevin Ashley

arXiv:2605.16767·cs.CL·May 19, 2026

Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Free

Li Zhang, Jaromir Savelka, Kevin Ashley

PDF

TL;DR

This paper introduces a retrieval-based approach for multi-label legal annotation that is data-efficient, adaptable to evolving taxonomies, and free from hallucinations, outperforming traditional generative models.

Contribution

The authors propose a retrieval-based method for legal annotation that avoids retraining, reduces hallucinations, and improves accuracy and efficiency over generative models.

Findings

01

Retrieval achieves competitive accuracy across datasets.

02

On Eurlex, retrieval improves Macro-F1 from 40.41 to 49.12.

03

Retrieval nearly doubles Micro-F1 with only 100 training samples.

Abstract

Multi-label legal annotation requires assigning multiple labels from large, evolving taxonomies to long, fact-intensive documents, often under limited supervision. Parametric encoders typically require task-specific training and retraining when the label set changes, while prompting generative large language models becomes costly and degrades as the label space grows. We cast legal annotation as retrieval: we embed documents and label descriptions with a frozen retrieval model and predict labels via k-nearest neighbors in the embedding space, enabling updates by re-embedding and re-indexing rather than gradient-based backpropagation. Across three legal datasets (ECtHR-A, ECtHR-B, and Eurlex with 100 labels), retrieval achieves competitive accuracy and strong data efficiency; on Eurlex, Qwen-8B retrieval improves Macro-F1 from 40.41 (GPT-5.2, zero-shot) to 49.12 while reducing estimated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.