Legal-DC: Benchmarking Retrieval-Augmented Generation for Legal Documents
Yaocong Li, Qiang Lan, Leihan Zhang, Le Zhang

TL;DR
This paper introduces Legal-DC, a specialized benchmark dataset and framework for retrieval-augmented generation in Chinese legal documents, addressing evaluation gaps and improving answer accuracy in legal AI systems.
Contribution
It creates the Legal-DC benchmark dataset and proposes the LegRAG framework with legal adaptive indexing and self-reflection mechanisms for improved legal document retrieval and generation.
Findings
LegRAG outperforms existing methods by 1.3% to 5.6% on key metrics.
Legal-DC provides 480 legal documents and 2,475 question-answer pairs with clause-level references.
Automated evaluation methods enhance the reliability of legal retrieval systems.
Abstract
Retrieval-Augmented Generation (RAG) has emerged as a promising technology for legal document consultation, yet its application in Chinese legal scenarios faces two key limitations: existing benchmarks lack specialized support for joint retriever-generator evaluation, and mainstream RAG systems often fail to accommodate the structured nature of legal provisions. To address these gaps, this study advances two core contributions: First, we constructed the Legal-DC benchmark dataset, comprising 480 legal documents (covering areas such as market regulation and contract management) and 2,475 refined question-answer pairs, each annotated with clause-level references, filling the gap for specialized evaluation resources in Chinese legal RAG. Second, we propose the LegRAG framework, which integrates legal adaptive indexing (clause-boundary segmentation) with a dual-path self-reflection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Law · Natural Language Processing Techniques
