Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM   Framework for Detecting Factual Errors

Alex Chandler; Devesh Surve; Hui Su

arXiv:2406.13009·cs.CL·June 21, 2024

Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual Errors

Alex Chandler, Devesh Surve, Hui Su

PDF

Open Access 1 Repo 1 Video

TL;DR

DEEP is an end-to-end LLM framework that effectively detects factual errors in text summaries by ensembling diverse prompts and calibrating outputs, achieving state-of-the-art accuracy without model fine-tuning.

Contribution

The paper introduces DEEP, a novel prompt ensembling approach that improves factual error detection in summaries without requiring fine-tuning or complex thresholding.

Findings

01

Achieves state-of-the-art accuracy on multiple summarization benchmarks.

02

Outperforms prior models significantly without fine-tuning.

03

Provides a practical, threshold-free error detection method.

Abstract

Accurate text summarization is one of the most common and important tasks performed by Large Language Models, where the costs of human review for an entire document may be high, but the costs of errors in summarization may be even greater. We propose Detecting Errors through Ensembling Prompts (DEEP) - an end-to-end large language model framework for detecting factual errors in text summarization. Our framework uses a diverse set of LLM prompts to identify factual inconsistencies, treating their outputs as binary features, which are then fed into ensembling models. We then calibrate the ensembled models to produce empirically accurate probabilities that a text is factually consistent or free of hallucination. We demonstrate that prior models for detecting factual errors in summaries perform significantly worse without optimizing the thresholds on subsets of the evaluated dataset. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

achandlr/DEEP
noneOfficial

Videos

Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual Errors· underline

Taxonomy

TopicsSoftware Engineering Research · Risk and Safety Analysis

MethodsSparse Evolutionary Training