Hallucination-Free? Assessing the Reliability of Leading AI Legal   Research Tools

Varun Magesh; Faiz Surani; Matthew Dahl; Mirac Suzgun; Christopher D.; Manning; Daniel E. Ho

arXiv:2405.20362·cs.CL·June 3, 2024·32 cites

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools

Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D., Manning, Daniel E. Ho

PDF

Open Access 1 Datasets 1 Video

TL;DR

This study empirically evaluates the reliability of leading AI legal research tools, revealing that despite claims of hallucination mitigation, these systems still produce false information at significant rates, raising concerns about their use in legal practice.

Contribution

It is the first to systematically assess proprietary AI legal tools, introducing a dataset, typology, and analysis of their vulnerabilities and reliability.

Findings

01

LexisNexis and Thomson Reuters AI tools hallucinate 17-33% of the time.

02

Hallucination rates are lower than general-purpose chatbots like GPT-4.

03

Substantial differences exist in responsiveness and accuracy among systems.

Abstract

Legal practice has witnessed a sharp rise in products incorporating artificial intelligence (AI). Such tools are designed to assist with a wide range of core legal tasks, from search and summarization of caselaw to document drafting. But the large language models used in these tools are prone to "hallucinate," or make up false information, making their use risky in high-stakes domains. Recently, certain legal research providers have touted methods such as retrieval-augmented generation (RAG) as "eliminating" (Casetext, 2023) or "avoid[ing]" hallucinations (Thomson Reuters, 2023), or guaranteeing "hallucination-free" legal citations (LexisNexis, 2023). Because of the closed nature of these systems, systematically assessing these claims is challenging. In this article, we design and report on the first preregistered empirical evaluation of AI-driven legal research tools. We demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

reglab/legal_rag_hallucinations
dataset· 147 dl
147 dl

Videos

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools (Paper Explained)· youtube

Taxonomy

TopicsLaw, AI, and Intellectual Property · Ethics and Social Impacts of AI · Artificial Intelligence in Law