SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics
Daniel Deutsch, Dan Roth

TL;DR
SacreROUGE is an open-source Python library that simplifies the use and development of summarization evaluation metrics, facilitating easier benchmarking against human judgments and dataset integration.
Contribution
It provides a unified interface for existing metrics, tools for evaluating metric correlation with human judgments, and scripts for dataset loading, streamlining summarization evaluation research.
Findings
Unified Python interface for multiple metrics
Tools for correlating metrics with human judgments
Scripts for dataset loading and formatting
Abstract
We present SacreROUGE, an open-source library for using and developing summarization evaluation metrics. SacreROUGE removes many obstacles that researchers face when using or developing metrics: (1) The library provides Python wrappers around the official implementations of existing evaluation metrics so they share a common, easy-to-use interface; (2) it provides functionality to evaluate how well any metric implemented in the library correlates to human-annotated judgments, so no additional code needs to be written for a new evaluation metric; and (3) it includes scripts for loading datasets that contain human judgments so they can easily be used for evaluation. This work describes the design of the library, including the core Metric interface, the command-line API for evaluating summarization models and metrics, and the scripts to load and reformat publicly available datasets. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
