LLMs as Judges: Toward The Automatic Review of GSN-compliant Assurance Cases
Gerhard Yu, Mithila Sivakumar, Alvine B. Belle, Soude Ghari, Song Wang, Timothy C. Lethbridge

TL;DR
This paper explores using large language models as automated judges for reviewing assurance cases, aiming to improve efficiency and consistency in verifying critical system requirements.
Contribution
It introduces a novel LLM-based review approach with predicate rules and tailored prompts, demonstrating promising results with state-of-the-art LLMs in assurance case review.
Findings
DeepSeek-R1 outperforms GPT-4.1 in review tasks
LLMs show good review capabilities but still need human refinement
GPT-4.1 and DeepSeek-R1 perform best among tested models
Abstract
Assurance cases allow verifying the correct implementation of certain non-functional requirements of mission-critical systems, including their safety, security, and reliability. They can be used in the specification of autonomous driving, avionics, air traffic control, and similar systems. They aim to reduce risks of harm of all kinds including human mortality, environmental damage, and financial loss. However, assurance cases often tend to be organized as extensive documents spanning hundreds of pages, making their creation, review, and maintenance error-prone, time-consuming, and tedious. Therefore, there is a growing need to leverage (semi-)automated techniques, such as those powered by generative AI and large language models (LLMs), to enhance efficiency, consistency, and accuracy across the entire assurance-case lifecycle. In this paper, we focus on assurance case review, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSafety Systems Engineering in Autonomy · Adversarial Robustness in Machine Learning · Autonomous Vehicle Technology and Safety
