UniSumEval: Towards Unified, Fine-Grained, Multi-Dimensional   Summarization Evaluation for LLMs

Yuho Lee; Taewon Yun; Jason Cai; Hang Su; Hwanjun Song

arXiv:2409.19898·cs.CL·October 2, 2024

UniSumEval: Towards Unified, Fine-Grained, Multi-Dimensional Summarization Evaluation for LLMs

Yuho Lee, Taewon Yun, Jason Cai, Hang Su, Hwanjun Song

PDF

Open Access 1 Repo 1 Video

TL;DR

UniSumEval introduces a comprehensive benchmark for summarization evaluation that covers diverse input scenarios and multiple evaluation dimensions, utilizing AI assistance for data creation and annotation.

Contribution

It presents a new benchmark with fine-grained, multi-dimensional annotations across varied input contexts, and benchmarks recent language models and evaluation methods.

Findings

01

Insights into model performance across different contexts and dimensions

02

Comparison of state-of-the-art automated evaluators

03

Enhanced annotation quality with AI assistance

Abstract

Existing benchmarks for summarization quality evaluation often lack diverse input scenarios, focus on narrowly defined dimensions (e.g., faithfulness), and struggle with subjective and coarse-grained annotation schemes. To address these shortcomings, we create UniSumEval benchmark, which extends the range of input context (e.g., domain, length) and provides fine-grained, multi-dimensional annotations. We use AI assistance in data creation, identifying potentially hallucinogenic input texts, and also helping human annotators reduce the difficulty of fine-grained annotation tasks. With UniSumEval, we benchmark nine latest language models as summarizers, offering insights into their performance across varying input contexts and evaluation dimensions. Furthermore, we conduct a thorough comparison of SOTA automated summary evaluators. Our benchmark data will be available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

disl-lab/unisumeval-v1.0
noneOfficial

Videos

UniSumEval: Towards Unified, Fine-grained, Multi-dimensional Summarization Evaluation for LLMs· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Mathematics, Computing, and Information Processing

MethodsFocus