LLMAuditor: A Framework for Auditing Large Language Models Using   Human-in-the-Loop

Maryam Amirizaniani; Jihan Yao; Adrian Lavergne; Elizabeth Snell; Okada; Aman Chadha; Tanya Roosta; Chirag Shah

arXiv:2402.09346·cs.AI·May 24, 2024·3 cites

LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop

Maryam Amirizaniani, Jihan Yao, Adrian Lavergne, Elizabeth Snell, Okada, Aman Chadha, Tanya Roosta, Chirag Shah

PDF

Open Access

TL;DR

The paper introduces LLMAuditor, an automatic, human-in-the-loop framework for auditing large language models to detect bias, hallucination, and inconsistencies, improving reliability and transparency in model evaluation.

Contribution

It presents a novel, scalable framework combining human verification and structured prompts to reliably audit LLMs using different models, enhancing scientific rigor.

Findings

01

Generated reliable probes from one LLM to audit another.

02

Structured prompts with HIL improve audit reliability.

03

Auditing reduces hallucinations in LLM responses.

Abstract

As Large Language Models (LLMs) become more pervasive across various users and scenarios, identifying potential issues when using these models becomes essential. Examples of such issues include: bias, inconsistencies, and hallucination. Although auditing the LLM for these problems is often warranted, such a process is neither easy nor accessible for most. An effective method is to probe the LLM using different versions of the same question. This could expose inconsistencies in its knowledge or operation, indicating potential for bias or hallucination. However, to operationalize this auditing method at scale, we need an approach to create those probes reliably and automatically. In this paper we propose the LLMAuditor framework which is an automatic, and scalable solution, where one uses a different LLM along with human-in-the-loop (HIL). This approach offers verifiability and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSparse Evolutionary Training