Evaluating and Improving Factuality in Multimodal Abstractive   Summarization

David Wan; Mohit Bansal

arXiv:2211.02580·cs.CL·November 7, 2022

Evaluating and Improving Factuality in Multimodal Abstractive Summarization

David Wan, Mohit Bansal

PDF

Open Access 1 Repo

TL;DR

This paper introduces CLIPBERTScore, a new multimodal factuality metric combining CLIPScore and BERTScore, which improves evaluation accuracy for vision-and-language summarization and supports downstream tasks like image selection and reinforcement learning.

Contribution

The paper proposes CLIPBERTScore, a novel multimodal factuality metric that outperforms existing metrics and demonstrates its effectiveness in evaluation and downstream applications.

Findings

01

CLIPBERTScore achieves higher correlation with human judgments than existing metrics.

02

It outperforms an existing multimodal summarization metric and is competitive with fine-tuned metrics.

03

The metric is robust across multiple evaluation benchmarks.

Abstract

Current metrics for evaluating factuality for abstractive document summarization have achieved high correlations with human judgment, but they do not account for the vision modality and thus are not adequate for vision-and-language summarization. We propose CLIPBERTScore, a simple weighted combination of CLIPScore and BERTScore to leverage the robustness and strong factuality detection performance between image-summary and document-summary, respectively. Next, due to the lack of meta-evaluation benchmarks to evaluate the quality of multimodal factuality metrics, we collect human judgments of factuality with respect to documents and images. We show that this simple combination of two metrics in the zero-shot setting achieves higher correlations than existing factuality metrics for document summarization, outperforms an existing multimodal summarization metric, and performs competitively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

meetdavidwan/faithful-multimodal-summ
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies