Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial   Stance for Summary Evaluation

Mahnaz Koupaee; Jake W. Vincent; Saab Mansour; Igor Shalyminov; Han; He; Hwanjun Song; Raphael Shu; Jianfeng He; Yi Nian; Amy Wing-mei Wong; Kyu; J. Han; Hang Su

arXiv:2502.08514·cs.CL·February 14, 2025

Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation

Mahnaz Koupaee, Jake W. Vincent, Saab Mansour, Igor Shalyminov, Han, He, Hwanjun Song, Raphael Shu, Jianfeng He, Yi Nian, Amy Wing-mei Wong, Kyu, J. Han, Hang Su

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a multi-agent debate framework with initial stances for improved summary faithfulness evaluation, addressing issues of ambiguity and enhancing error detection in summaries.

Contribution

It proposes a novel multi-agent debate approach with initial stances to improve faithfulness evaluation and introduces a taxonomy for ambiguity in summaries.

Findings

01

Better identification of errors in summaries.

02

Enhanced detection of ambiguous summaries.

03

Stronger performance on non-ambiguous summaries.

Abstract

Faithfulness evaluators based on large language models (LLMs) are often fooled by the fluency of the text and struggle with identifying errors in the summaries. We propose an approach to summary faithfulness evaluation in which multiple LLM-based agents are assigned initial stances (regardless of what their belief might be) and forced to come up with a reason to justify the imposed belief, thus engaging in a multi-round debate to reach an agreement. The uniformly distributed initial assignments result in a greater diversity of stances leading to more meaningful debates and ultimately more errors identified. Furthermore, by analyzing the recent faithfulness evaluation datasets, we observe that naturally, it is not always the case for a summary to be either faithful to the source document or not. We therefore introduce a new dimension, ambiguity, and a detailed taxonomy to identify such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amazon-science/madisse
noneOfficial

Videos

Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation· underline

Taxonomy

TopicsComplex Systems and Decision Making · Experimental Behavioral Economics Studies