Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization
Yue Zhang, Jingxuan Zuo, Ke Su, Liqiang Jing

TL;DR
This paper introduces two explainable, fine-grained evaluation frameworks for assessing the factuality of multimodal summarization, applicable with or without reference summaries, improving the reliability of model outputs.
Contribution
The paper presents novel reference-based and reference-free frameworks for factuality evaluation in multimodal summarization, enhancing applicability and interpretability.
Findings
The frameworks show high correlation with existing metrics.
The reference-free framework broadens evaluation scenarios.
Experimental results confirm the effectiveness of the proposed methods.
Abstract
Multimodal summarization aims to generate a concise summary based on the input text and image. However, the existing methods potentially suffer from unfactual output. To evaluate the factuality of multimodal summarization models, we propose two fine-grained and explainable evaluation frameworks (FALLACIOUS) for different application scenarios, i.e. reference-based factuality evaluation framework and reference-free factuality evaluation framework. Notably, the reference-free factuality evaluation framework doesn't need ground truth and hence it has a wider application scenario. To evaluate the effectiveness of the proposed frameworks, we compute the correlation between our frameworks and the other metrics. The experimental results show the effectiveness of our proposed method. We will release our code and dataset via github.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
