Reliability by design: quantifying and eliminating fabrication risk in LLMs. From generative to consultative AI: a comparative analysis in the legal domain and lessons for high-stakes knowledge bases
Alex Dantart

TL;DR
This paper evaluates different AI architectures for legal applications, introducing reliability metrics and demonstrating that advanced retrieval-augmented models significantly reduce hallucinations, enhancing trustworthiness in high-stakes legal AI systems.
Contribution
It introduces two new reliability metrics and provides a comprehensive evaluation of LLM architectures, highlighting the effectiveness of advanced retrieval-augmented systems in reducing fabrication errors.
Findings
Standalone models have high error rates (FCR > 30%).
Basic RAG reduces errors but still has notable misgrounding.
Advanced RAG achieves negligible fabrication rates below 0.2%.
Abstract
This paper examines how to make large language models reliable for high-stakes legal work by reducing hallucinations. It distinguishes three AI paradigms: (1) standalone generative models ("creative oracle"), (2) basic retrieval-augmented systems ("expert archivist"), and (3) an advanced, end-to-end optimized RAG system ("rigorous archivist"). The authors introduce two reliability metrics -False Citation Rate (FCR) and Fabricated Fact Rate (FFR)- and evaluate 2,700 judicial-style answers from 12 LLMs across 75 legal tasks using expert, double-blind review. Results show that standalone models are unsuitable for professional use (FCR above 30%), while basic RAG greatly reduces errors but still leaves notable misgrounding. Advanced RAG, using techniques such as embedding fine-tuning, re-ranking, and self-correction, reduces fabrication to negligible levels (below 0.2%). The study concludes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Artificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI
