# AllSummedUp: un framework open-source pour comparer les metriques d'evaluation de resume

**Authors:** Tanguy Herserant, Vincent Guigue

arXiv: 2508.21389 · 2025-09-01

## TL;DR

This paper presents AllSummedUp, an open-source framework for comparing evaluation metrics in automatic text summarization, highlighting discrepancies, trade-offs, and reproducibility issues among various metrics including LLM-based methods.

## Contribution

It introduces a unified, open-source framework for fair comparison of summarization evaluation metrics and analyzes their reproducibility and alignment with human judgments.

## Key findings

- Metrics with better human alignment are more computationally intensive.
- LLM-based evaluation methods show high variability and limited reproducibility.
- There are significant discrepancies between reported and observed metric performances.

## Abstract

This paper investigates reproducibility challenges in automatic text summarization evaluation. Based on experiments conducted across six representative metrics ranging from classical approaches like ROUGE to recent LLM-based methods (G-Eval, SEval-Ex), we highlight significant discrepancies between reported performances in the literature and those observed in our experimental setting. We introduce a unified, open-source framework, applied to the SummEval dataset and designed to support fair and transparent comparison of evaluation metrics. Our results reveal a structural trade-off: metrics with the highest alignment with human judgments tend to be computationally intensive and less stable across runs. Beyond comparative analysis, this study highlights key concerns about relying on LLMs for evaluation, stressing their randomness, technical dependencies, and limited reproducibility. We advocate for more robust evaluation protocols including exhaustive documentation and methodological standardization to ensure greater reliability in automatic summarization assessment.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21389/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/2508.21389/full.md

---
Source: https://tomesphere.com/paper/2508.21389