Analyzing LLM Behavior in Dialogue Summarization: Unveiling   Circumstantial Hallucination Trends

Sanjana Ramprasad; Elisa Ferracane; Zachary C. Lipton

arXiv:2406.03487·cs.CL·June 6, 2024

Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends

Sanjana Ramprasad, Elisa Ferracane, Zachary C. Lipton

PDF

Open Access

TL;DR

This paper benchmarks the faithfulness of GPT-4 and Alpaca-13B in dialogue summarization, revealing their tendency to generate plausible but unsupported inferences, and introduces a new taxonomy and detection methods for these errors.

Contribution

It introduces a refined taxonomy of hallucination errors, focusing on circumstantial inference, and proposes prompt-based detection methods that outperform existing metrics.

Findings

01

LLMs often generate plausible inferences supported by circumstantial evidence.

02

New taxonomy categorizes 'Circumstantial Inference' errors in dialogue summarization.

03

Prompt-based detection approaches outperform existing metrics for nuanced error detection.

Abstract

Recent advancements in large language models (LLMs) have considerably advanced the capabilities of summarization systems. However, they continue to face concerns about hallucinations. While prior work has evaluated LLMs extensively in news domains, most evaluation of dialogue summarization has focused on BART-based models, leaving a gap in our understanding of their faithfulness. Our work benchmarks the faithfulness of LLMs for dialogue summarization, using human annotations and focusing on identifying and categorizing span-level inconsistencies. Specifically, we focus on two prominent LLMs: GPT-4 and Alpaca-13B. Our evaluation reveals subtleties as to what constitutes a hallucination: LLMs often generate plausible inferences, supported by circumstantial evidence in the conversation, that lack direct evidence, a pattern that is less prevalent in older models. We propose a refined…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsAttention Is All You Need · Softmax · Focus · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer