Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends
Sanjana Ramprasad, Elisa Ferracane, Zachary C. Lipton

TL;DR
This paper benchmarks the faithfulness of GPT-4 and Alpaca-13B in dialogue summarization, revealing their tendency to generate plausible but unsupported inferences, and introduces a new taxonomy and detection methods for these errors.
Contribution
It introduces a refined taxonomy of hallucination errors, focusing on circumstantial inference, and proposes prompt-based detection methods that outperform existing metrics.
Findings
LLMs often generate plausible inferences supported by circumstantial evidence.
New taxonomy categorizes 'Circumstantial Inference' errors in dialogue summarization.
Prompt-based detection approaches outperform existing metrics for nuanced error detection.
Abstract
Recent advancements in large language models (LLMs) have considerably advanced the capabilities of summarization systems. However, they continue to face concerns about hallucinations. While prior work has evaluated LLMs extensively in news domains, most evaluation of dialogue summarization has focused on BART-based models, leaving a gap in our understanding of their faithfulness. Our work benchmarks the faithfulness of LLMs for dialogue summarization, using human annotations and focusing on identifying and categorizing span-level inconsistencies. Specifically, we focus on two prominent LLMs: GPT-4 and Alpaca-13B. Our evaluation reveals subtleties as to what constitutes a hallucination: LLMs often generate plausible inferences, supported by circumstantial evidence in the conversation, that lack direct evidence, a pattern that is less prevalent in older models. We propose a refined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsAttention Is All You Need · Softmax · Focus · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer
