GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence

Kundan Krishna; Sanjana Ramprasad; Prakhar Gupta; Byron C. Wallace,; Zachary C. Lipton; Jeffrey P. Bigham

arXiv:2402.12566·cs.CL·January 22, 2025·1 cites

GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence

Kundan Krishna, Sanjana Ramprasad, Prakhar Gupta, Byron C. Wallace,, Zachary C. Lipton, Jeffrey P. Bigham

PDF

Open Access

TL;DR

GenAudit is a tool that helps identify and correct factual errors in language model outputs by suggesting edits and providing supporting evidence from reference documents, improving accuracy in high-stakes applications.

Contribution

This paper introduces GenAudit, a novel fact-checking tool that detects errors and suggests corrections in LLM responses using reference evidence, with an interactive interface and comprehensive evaluation.

Findings

01

GenAudit detects errors in 8 different LLM outputs across diverse domains.

02

Using GenAudit significantly improves human ability to find errors in LLM summaries.

03

The tool and models are publicly released for broader use.

Abstract

LLMs can generate factually incorrect statements even when provided access to reference documents. Such errors can be dangerous in high-stakes applications (e.g., document-grounded QA for healthcare or finance). We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks. GenAudit suggests edits to the LLM response by revising or removing claims that are not supported by the reference document, and also presents evidence from the reference for facts that do appear to have support. We train models to execute these tasks, and design an interactive interface to present suggested edits and evidence to users. Comprehensive evaluation by human raters shows that GenAudit can detect errors in 8 different LLM outputs when summarizing documents from diverse domains. User studies demonstrate that using GenAudit can substantially improve the performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling