CIDEr-R: Robust Consensus-based Image Description Evaluation
Gabriel Oliveira dos Santos, Esther Luna Colombini, Sandra Avila

TL;DR
This paper introduces CIDEr-R, an improved image caption evaluation metric that is more robust and accurate than CIDEr-D, especially on datasets with high sentence length variance and fewer reference sentences.
Contribution
CIDEr-R enhances CIDEr-D by addressing its limitations, making it more flexible and closer to human judgment for evaluating image descriptions.
Findings
CIDEr-R outperforms CIDEr-D in accuracy and robustness.
CIDEr-R is effective in datasets with high sentence length variance.
Optimizing CIDEr-R with Self-Critical Sequence Training improves caption quality.
Abstract
This paper shows that CIDEr-D, a traditional evaluation metric for image description, does not work properly on datasets where the number of words in the sentence is significantly greater than those in the MS COCO Captions dataset. We also show that CIDEr-D has performance hampered by the lack of multiple reference sentences and high variance of sentence length. To bypass this problem, we introduce CIDEr-R, which improves CIDEr-D, making it more flexible in dealing with datasets with high sentence length variance. We demonstrate that CIDEr-R is more accurate and closer to human judgment than CIDEr-D; CIDEr-R is more robust regarding the number of available references. Our results reveal that using Self-Critical Sequence Training to optimize CIDEr-R generates descriptive captions. In contrast, when CIDEr-D is optimized, the generated captions' length tends to be similar to the reference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
