DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence
Wei Zhao, Michael Strube, Steffen Eger

TL;DR
DiscoScore is a new BERT-based metric designed to evaluate discourse coherence in text generation, outperforming existing metrics like BARTScore in correlating with human judgments at the system level.
Contribution
The paper introduces DiscoScore, a parametrized discourse coherence metric based on BERT and Centering theory, improving system-level evaluation of coherence and factual consistency.
Findings
Most BERT-based metrics correlate poorly with human coherence ratings.
BARTScore performs weakly at system level for evaluation.
DiscoScore surpasses BARTScore by over 10 correlation points on average.
Abstract
Recently, there has been a growing interest in designing text generation systems from a discourse coherence perspective, e.g., modeling the interdependence between sentences. Still, recent BERT-based evaluation metrics are weak in recognizing coherence, and thus are not reliable in a way to spot the discourse-level improvements of those text generation systems. In this work, we introduce DiscoScore, a parametrized discourse metric, which uses BERT to model discourse coherence from different perspectives, driven by Centering theory. Our experiments encompass 16 non-discourse and discourse metrics, including DiscoScore and popular coherence models, evaluated on summarization and document-level machine translation (MT). We find that (i) the majority of BERT-based metrics correlate much worse with human rated coherence than early discourse metrics, invented a decade ago; (ii) the recent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Weight Decay · Softmax · WordPiece · Dense Connections
