OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization

Yuchen Shen; Xiaojun Wan

arXiv:2310.18122·cs.CL·November 14, 2023·2 cites

OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization

Yuchen Shen, Xiaojun Wan

PDF

Open Access 1 Repo

TL;DR

OpinSummEval introduces a new dataset with human judgments and model outputs for opinion summarization, revealing that current automatic metrics, especially neural-based ones, still struggle to reliably evaluate summary quality across different aspects.

Contribution

This paper presents OpinSummEval, a comprehensive dataset and analysis of automatic evaluation metrics for opinion summarization, highlighting the limitations of existing neural and non-neural metrics.

Findings

01

Neural network-based metrics outperform non-neural ones.

02

Even advanced metrics like BART and GPT-3/3.5 do not consistently align with human judgments.

03

There is a critical need for improved automated evaluation methods for opinion summarization.

Abstract

Opinion summarization sets itself apart from other types of summarization tasks due to its distinctive focus on aspects and sentiments. Although certain automated evaluation methods like ROUGE have gained popularity, we have found them to be unreliable measures for assessing the quality of opinion summaries. In this paper, we present OpinSummEval, a dataset comprising human judgments and outputs from 14 opinion summarization models. We further explore the correlation between 24 automatic metrics and human ratings across four dimensions. Our findings indicate that metrics based on neural networks generally outperform non-neural ones. However, even metrics built on powerful backbones, such as BART and GPT-3/3.5, do not consistently correlate well across all dimensions, highlighting the need for advancements in automated evaluation methods for opinion summarization. The code and data are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

a-chicharito-s/opinsummeval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Layer Normalization · Residual Connection · Byte Pair Encoding · Softmax · Dense Connections · Dropout · Focus