De Jure: Iterative LLM Self-Refinement for Structured Extraction of Regulatory Rules
Keerat Guliani, Deepkamal Gill, David Landsman, Nima Eshraghi, Krishna Kumar, Lovedeep Gondara

TL;DR
De Jure is an automated, domain-agnostic pipeline that extracts structured regulatory rules from legal documents using iterative LLM self-refinement, eliminating the need for human annotation.
Contribution
It introduces a fully automated, multi-stage process for extracting and refining legal rules from raw documents without requiring annotated data or domain-specific prompts.
Findings
De Jure improves extraction quality with iterative refinement.
It generalizes well across finance, healthcare, and AI governance domains.
Grounded rule extraction enhances downstream question-answering performance.
Abstract
Regulatory documents encode legally binding obligations that LLM-based systems must respect. Yet converting dense, hierarchically structured legal text into machine-readable rules remains a costly, expert-intensive process. We present De Jure, a fully automated, domain-agnostic pipeline for extracting structured regulatory rules from raw documents, requiring no human annotation, domain-specific prompting, or annotated gold data. De Jure operates through four sequential stages: normalization of source documents into structured Markdown; LLM-driven semantic decomposition into structured rule units; multi-criteria LLM-as-a-judge evaluation across 19 dimensions spanning metadata, definitions, and rule semantics; and iterative repair of low-scoring extractions within a bounded regeneration budget, where upstream components are repaired before rule units are evaluated. We evaluate De Jure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
