Evaluating and Improving Factuality in Multimodal Abstractive Summarization
David Wan, Mohit Bansal

TL;DR
This paper introduces CLIPBERTScore, a new multimodal factuality metric combining CLIPScore and BERTScore, which improves evaluation accuracy for vision-and-language summarization and supports downstream tasks like image selection and reinforcement learning.
Contribution
The paper proposes CLIPBERTScore, a novel multimodal factuality metric that outperforms existing metrics and demonstrates its effectiveness in evaluation and downstream applications.
Findings
CLIPBERTScore achieves higher correlation with human judgments than existing metrics.
It outperforms an existing multimodal summarization metric and is competitive with fine-tuned metrics.
The metric is robust across multiple evaluation benchmarks.
Abstract
Current metrics for evaluating factuality for abstractive document summarization have achieved high correlations with human judgment, but they do not account for the vision modality and thus are not adequate for vision-and-language summarization. We propose CLIPBERTScore, a simple weighted combination of CLIPScore and BERTScore to leverage the robustness and strong factuality detection performance between image-summary and document-summary, respectively. Next, due to the lack of meta-evaluation benchmarks to evaluate the quality of multimodal factuality metrics, we collect human judgments of factuality with respect to documents and images. We show that this simple combination of two metrics in the zero-shot setting achieves higher correlations than existing factuality metrics for document summarization, outperforms an existing multimodal summarization metric, and performs competitively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
