LLMs Provide Unstable Answers to Legal Questions

Andrew Blair-Stanek; Benjamin Van Durme

arXiv:2502.05196·cs.CL·February 11, 2025·2 cites

LLMs Provide Unstable Answers to Legal Questions

Andrew Blair-Stanek, Benjamin Van Durme

PDF

Open Access

TL;DR

This paper demonstrates that leading large language models exhibit significant instability in answering complex legal questions, often providing inconsistent conclusions even when asked the same question multiple times.

Contribution

The authors introduce a novel dataset of 500 real-world legal questions and systematically evaluate the answer stability of top LLMs, revealing their unreliability in legal contexts.

Findings

01

LLMs often give different answers to the same legal question

02

Answer stability varies across different LLMs and questions

03

Implications for legal AI applications and reliance on LLMs in legal settings

Abstract

An LLM is stable if it reaches the same conclusion when asked the identical question multiple times. We find leading LLMs like gpt-4o, claude-3.5, and gemini-1.5 are unstable when providing answers to hard legal questions, even when made as deterministic as possible by setting temperature to 0. We curate and release a novel dataset of 500 legal questions distilled from real cases, involving two parties, with facts, competing legal arguments, and the question of which party should prevail. When provided the exact same question, we observe that LLMs sometimes say one party should win, while other times saying the other party should win. This instability has implications for the increasing numbers of legal AI products, legal processes, and lawyers relying on these LLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDispute Resolution and Class Actions · Law, AI, and Intellectual Property · Legal Education and Practice Innovations