EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations

Hyunjong Kim; Sangyeop Kim; Jongheon Jeong; Yeongjae Cho; Sungzoon Cho

arXiv:2506.24016·cs.CL·July 1, 2025

EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations

Hyunjong Kim, Sangyeop Kim, Jongheon Jeong, Yeongjae Cho, Sungzoon Cho

PDF

Open Access

TL;DR

EXPERT is a novel reference-free image captioning evaluation metric that offers structured explanations based on fluency, relevance, and descriptiveness, achieving state-of-the-art results and higher explanation quality.

Contribution

The paper introduces EXPERT, a new explainable evaluation metric with structured explanations and a two-stage supervision method for improved captioning assessment.

Findings

01

Achieves state-of-the-art results on benchmark datasets.

02

Provides higher-quality explanations than existing metrics.

03

Validated through comprehensive human evaluation.

Abstract

Recent advances in large language models and vision-language models have led to growing interest in explainable evaluation metrics for image captioning. However, these metrics generate explanations without standardized criteria, and the overall quality of the generated explanations remains unverified. In this paper, we propose EXPERT, a reference-free evaluation metric that provides structured explanations based on three fundamental criteria: fluency, relevance, and descriptiveness. By constructing large-scale datasets of high-quality structured explanations, we develop a two-stage evaluation template to effectively supervise a vision-language model for both scoring and explanation generation. EXPERT achieves state-of-the-art results on benchmark datasets while providing significantly higher-quality explanations than existing metrics, as validated through comprehensive human evaluation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis