DIAL-SUMMER: A Structured Evaluation Framework of Hierarchical Errors in Dialogue Summaries

Sahana Ramnath; Nima Chitsazan; Mingyang Zhou; Chia-Hsuan Lee; Shi-Xiong Zhang; Stephen Rawls; Sambit Sahu; Sangwoo Cho; Xiang Ren; Genta Indra Winata; Akshaj Kumar Veldanda

arXiv:2602.08149·cs.CL·February 10, 2026

DIAL-SUMMER: A Structured Evaluation Framework of Hierarchical Errors in Dialogue Summaries

Sahana Ramnath, Nima Chitsazan, Mingyang Zhou, Chia-Hsuan Lee, Shi-Xiong Zhang, Stephen Rawls, Sambit Sahu, Sangwoo Cho, Xiang Ren, Genta Indra Winata, Akshaj Kumar Veldanda

PDF

Open Access

TL;DR

This paper introduces DIAL-SUMMER, a comprehensive framework and dataset for evaluating hierarchical errors in dialogue summaries, addressing unique structural and viewpoint shifts in dialogue summarization tasks.

Contribution

It proposes a hierarchical error taxonomy and annotated dataset specifically designed for dialogue summaries, enabling detailed error analysis and improving evaluation methods.

Findings

01

Turns in the middle of dialogues are most often missed in summaries.

02

Extrinsic hallucinations mainly occur at the end of summaries.

03

LLM-Judges face challenges in accurately detecting dialogue summary errors.

Abstract

Dialogues are a predominant mode of communication for humans, and it is immensely helpful to have automatically generated summaries of them (e.g., to revise key points discussed in a meeting, to review conversations between customer agents and product users). Prior works on dialogue summary evaluation largely ignore the complexities specific to this task: (i) shift in structure, from multiple speakers discussing information in a scattered fashion across several turns, to a summary's sentences, and (ii) shift in narration viewpoint, from speakers' first/second-person narration, standardized third-person narration in the summary. In this work, we introduce our framework DIALSUMMER to address the above. We propose DIAL-SUMMER's taxonomy of errors to comprehensively evaluate dialogue summaries at two hierarchical levels: DIALOGUE-LEVEL that focuses on the broader speakers/turns, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques