LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop
Maryam Amirizaniani, Jihan Yao, Adrian Lavergne, Elizabeth Snell, Okada, Aman Chadha, Tanya Roosta, Chirag Shah

TL;DR
The paper introduces LLMAuditor, an automatic, human-in-the-loop framework for auditing large language models to detect bias, hallucination, and inconsistencies, improving reliability and transparency in model evaluation.
Contribution
It presents a novel, scalable framework combining human verification and structured prompts to reliably audit LLMs using different models, enhancing scientific rigor.
Findings
Generated reliable probes from one LLM to audit another.
Structured prompts with HIL improve audit reliability.
Auditing reduces hallucinations in LLM responses.
Abstract
As Large Language Models (LLMs) become more pervasive across various users and scenarios, identifying potential issues when using these models becomes essential. Examples of such issues include: bias, inconsistencies, and hallucination. Although auditing the LLM for these problems is often warranted, such a process is neither easy nor accessible for most. An effective method is to probe the LLM using different versions of the same question. This could expose inconsistencies in its knowledge or operation, indicating potential for bias or hallucination. However, to operationalize this auditing method at scale, we need an approach to create those probes reliably and automatically. In this paper we propose the LLMAuditor framework which is an automatic, and scalable solution, where one uses a different LLM along with human-in-the-loop (HIL). This approach offers verifiability and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsSparse Evolutionary Training
