Automated Consistency Analysis of LLMs

Aditya Patwardhan; Vivek Vaidya; Ashish Kundu

arXiv:2502.07036·cs.CR·March 12, 2025

Automated Consistency Analysis of LLMs

Aditya Patwardhan, Vivek Vaidya, Ashish Kundu

PDF

Open Access

TL;DR

This paper defines and evaluates the consistency of responses from large language models in cybersecurity, revealing that current models often produce inconsistent answers, which impacts their trustworthiness in critical applications.

Contribution

It introduces a formal definition of LLM response consistency and develops a framework for evaluating it through self-validation and cross-model validation.

Findings

01

LLMs often produce inconsistent responses in cybersecurity tasks.

02

The proposed framework effectively measures LLM response consistency.

03

Experiments show inconsistency issues across multiple popular LLMs.

Abstract

Generative AI (Gen AI) with large language models (LLMs) are being widely adopted across the industry, academia and government. Cybersecurity is one of the key sectors where LLMs can be and/or are already being used. There are a number of problems that inhibit the adoption of trustworthy Gen AI and LLMs in cybersecurity and such other critical areas. One of the key challenge to the trustworthiness and reliability of LLMs is: how consistent an LLM is in its responses? In this paper, we have analyzed and developed a formal definition of consistency of responses of LLMs. We have formally defined what is consistency of responses and then develop a framework for consistency evaluation. The paper proposes two approaches to validate consistency: self-validation, and validation across multiple LLMs. We have carried out extensive experiments for several LLMs such as GPT4oMini, GPT3.5, Gemini,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Business Process Modeling and Analysis · Service-Oriented Architecture and Web Services