BERTScore: Evaluating Text Generation with BERT

Tianyi Zhang; Varsha Kishore; Felix Wu; Kilian Q. Weinberger; Yoav; Artzi

arXiv:1904.09675·cs.CL·February 25, 2020·2.0k cites

BERTScore: Evaluating Text Generation with BERT

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, Yoav, Artzi

PDF

Open Access 5 Repos 2 Models 2 Datasets

TL;DR

BERTScore is a new automatic evaluation metric for text generation that uses contextual embeddings to compute token similarities, showing improved correlation with human judgments and robustness over existing metrics.

Contribution

It introduces BERTScore, a novel evaluation metric leveraging BERT's contextual embeddings for better assessment of text generation quality.

Findings

01

BERTScore correlates better with human judgments than existing metrics.

02

It demonstrates stronger model selection performance.

03

BERTScore is more robust to adversarial paraphrases.

Abstract

We propose BERTScore, an automatic evaluation metric for text generation. Analogously to common metrics, BERTScore computes a similarity score for each token in the candidate sentence with each token in the reference sentence. However, instead of exact matches, we compute token similarity using contextual embeddings. We evaluate using the outputs of 363 machine translation and image captioning systems. BERTScore correlates better with human judgments and provides stronger model selection performance than existing metrics. Finally, we use an adversarial paraphrase detection task to show that BERTScore is more robust to challenging examples when compared to existing metrics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications