Pre-gen metrics: Predicting caption quality metrics without generating captions
Marc Tanti, Albert Gatt, Adrian Muscat

TL;DR
This paper introduces pre-generation metrics that predict caption quality by analyzing model-assigned probabilities to reference captions, eliminating the need for caption generation and maintaining strong correlation with traditional evaluation metrics.
Contribution
It presents a novel approach to evaluate image caption quality without generating captions, using model probabilities, which simplifies and speeds up the evaluation process.
Findings
Pre-gen metrics are strongly correlated with standard metrics.
The approach reduces computational costs of evaluation.
It enables quick assessment of caption quality without generation.
Abstract
Image caption generation systems are typically evaluated against reference outputs. We show that it is possible to predict output quality without generating the captions, based on the probability assigned by the neural model to the reference captions. Such pre-gen metrics are strongly correlated to standard evaluation metrics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
