Is my Meeting Summary Good? Estimating Quality with a Multi-LLM   Evaluator

Frederic Kirstein; Terry Ruas; Bela Gipp

arXiv:2411.18444·cs.CL·February 19, 2025

Is my Meeting Summary Good? Estimating Quality with a Multi-LLM Evaluator

Frederic Kirstein, Terry Ruas, Bela Gipp

PDF

Open Access

TL;DR

This paper introduces MESA, a multi-LLM framework that improves the automatic evaluation of meeting summaries by better detecting errors and aligning with human judgments, reducing reliance on costly human assessments.

Contribution

MESA is a novel multi-LLM framework that employs error-specific assessment, multi-agent discussion, and self-training to enhance summary quality evaluation accuracy.

Findings

01

MESA achieves higher correlation with human judgment than previous methods.

02

The framework effectively detects nuanced errors in meeting summaries.

03

MESA adapts well to custom error guidelines across different tasks.

Abstract

The quality of meeting summaries generated by natural language generation (NLG) systems is hard to measure automatically. Established metrics such as ROUGE and BERTScore have a relatively low correlation with human judgments and fail to capture nuanced errors. Recent studies suggest using large language models (LLMs), which have the benefit of better context understanding and adaption of error definitions without training on a large number of human preference judgments. However, current LLM-based evaluators risk masking errors and can only serve as a weak proxy, leaving human evaluation the gold standard despite being costly and hard to compare across studies. In this work, we present MESA, an LLM-based framework employing a three-step assessment of individual error types, multi-agent discussion for decision refinement, and feedback-based self-training to refine error definition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHealthcare Systems and Technology · Library Science and Information Systems