Measuring the Groundedness of Legal Question-Answering Systems
Dietrich Trautmann, Natalia Ostapuk, Quentin Grail, Adrian Alan Pol,, Guglielmo Bonifazi, Shang Gao, Martin Gajek

TL;DR
This paper develops and evaluates methods to assess whether AI-generated legal responses are properly grounded in source material, aiming to improve trustworthiness in high-stakes legal applications.
Contribution
It introduces a comprehensive benchmark and novel evaluation techniques for groundedness in legal question-answering systems, including a new grounding classification corpus.
Findings
Best method achieved macro-F1 score of 0.8
Evaluation of latency for real-world applicability
Demonstrated potential for improving AI trustworthiness in legal settings
Abstract
In high-stakes domains like legal question-answering, the accuracy and trustworthiness of generative AI systems are of paramount importance. This work presents a comprehensive benchmark of various methods to assess the groundedness of AI-generated responses, aiming to significantly enhance their reliability. Our experiments include similarity-based metrics and natural language inference models to evaluate whether responses are well-founded in the given contexts. We also explore different prompting strategies for large language models to improve the detection of ungrounded responses. We validated the effectiveness of these methods using a newly created grounding classification corpus, designed specifically for legal queries and corresponding responses from retrieval-augmented prompting, focusing on their alignment with source material. Our results indicate potential in groundedness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Law
