Reasoning or Not? A Comprehensive Evaluation of Reasoning LLMs for Dialogue Summarization

Keyan Jin; Yapeng Wang; Leonel Santos; Tao Fang; Xu Yang; Sio Kei Im; Hugo Gon\c{c}alo Oliveira

arXiv:2507.02145·cs.CL·July 4, 2025

Reasoning or Not? A Comprehensive Evaluation of Reasoning LLMs for Dialogue Summarization

Keyan Jin, Yapeng Wang, Leonel Santos, Tao Fang, Xu Yang, Sio Kei Im, Hugo Gon\c{c}alo Oliveira

PDF

TL;DR

This paper systematically evaluates reasoning large language models against non-reasoning models for dialogue summarization, revealing that explicit reasoning often does not enhance and may even impair summary quality in complex dialogues.

Contribution

It provides the first comprehensive comparison of reasoning and non-reasoning LLMs across multiple dialogue summarization paradigms, languages, and benchmarks, highlighting their limitations.

Findings

01

Reasoning LLMs often produce more verbose summaries.

02

Explicit reasoning does not consistently improve summarization quality.

03

Reasoning models may generate less concise and sometimes less accurate summaries.

Abstract

Dialogue summarization is a challenging task with significant practical value in customer service, meeting analysis, and conversational AI. Although large language models (LLMs) have achieved substantial progress in summarization tasks, the performance of step-by-step reasoning architectures-specifically Long Chain-of-Thought (CoT) implementations such as OpenAI-o1 and DeepSeek-R1-remains unexplored for dialogue scenarios requiring concurrent abstraction and conciseness. In this work, we present the first comprehensive and systematic evaluation of state-of-the-art reasoning LLMs and non-reasoning LLMs across three major paradigms-generic, role-oriented, and query-oriented dialogue summarization. Our study spans diverse languages, domains, and summary lengths, leveraging strong benchmarks (SAMSum, DialogSum, CSDS, and QMSum) and advanced evaluation protocols that include both LLM-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.