TL;DR
This paper introduces a neural network-based learned metric for caption evaluation that improves correlation with human judgments at both caption and system levels, addressing limitations of existing metrics.
Contribution
The work presents a novel learned evaluation metric for captions, analyzing linguistic feature impacts, training variations, and robustness, outperforming existing metrics in correlation with human assessments.
Findings
Outperforms existing metrics in caption-level correlation
Shows strong system-level correlation with human judgments
Demonstrates robustness to sentence perturbations
Abstract
Automatic evaluation metrics hold a fundamental importance in the development and fine-grained analysis of captioning systems. While current evaluation metrics tend to achieve an acceptable correlation with human judgements at the system level, they fail to do so at the caption level. In this work, we propose a neural network-based learned metric to improve the caption-level caption evaluation. To get a deeper insight into the parameters which impact a learned metrics performance, this paper investigates the relationship between different linguistic features and the caption-level correlation of the learned metrics. We also compare metrics trained with different training examples to measure the variations in their evaluation. Moreover, we perform a robustness analysis, which highlights the sensitivity of learned and handcrafted metrics to various sentence perturbations. Our empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
