Evaluating the Efficacy of Summarization Evaluation across Languages

Fajri Koto; Jey Han Lau; Timothy Baldwin

arXiv:2106.01478·cs.CL·June 4, 2021

Evaluating the Efficacy of Summarization Evaluation across Languages

Fajri Koto, Jey Han Lau, Timothy Baldwin

PDF

Open Access 1 Repo

TL;DR

This paper systematically assesses how well automatic summarization evaluation metrics work across eight languages, finding multilingual BERT-based metrics perform consistently well beyond English.

Contribution

It introduces a panlinguistic evaluation framework for summarization metrics and demonstrates the effectiveness of multilingual BERT-based metrics across multiple languages.

Findings

01

Multilingual BERT within BERTScore outperforms other metrics across all tested languages.

02

Evaluation metrics show consistent performance above English benchmarks.

03

Systematic annotation approach for focus and coverage across languages.

Abstract

While automatic summarization evaluation methods developed for English are routinely applied to other languages, this is the first attempt to systematically quantify their panlinguistic efficacy. We take a summarization corpus for eight different languages, and manually annotate generated summaries for focus (precision) and coverage (recall). Based on this, we evaluate 19 summarization evaluation metrics, and find that using multilingual BERT within BERTScore performs well across all languages, at a level above that for English.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fajri91/Multi_SummEval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Layer Normalization · Residual Connection · WordPiece · Attention Dropout