Redemption Score: A Multi-Modal Evaluation Framework for Image Captioning via Distributional, Perceptual, and Linguistic Signal Triangulation
Ashim Dahal, Ankit Ghimire, Saydul Akbar Murad, Nick Rahimi

TL;DR
Redemption Score (RS) is a comprehensive evaluation framework for image captioning that combines distributional, perceptual, and linguistic signals to better align with human judgment.
Contribution
This paper introduces Redemption Score, a hybrid evaluation method that triangulates multiple signals for more holistic image caption assessment without task-specific training.
Findings
RS outperforms prior metrics on Flickr8k with a Kendall-τ of 58.42
RS correlates better with human judgments across multiple datasets
The framework offers a robust evaluation of visual accuracy and text quality
Abstract
Evaluating image captions requires cohesive assessment of both visual semantics and language pragmatics, which is often not entirely captured by most metrics. We introduce Redemption Score(RS), a novel hybrid framework that ranks image captions by triangulating three complementary signals: (1) Mutual Information Divergence (MID) for global image-text distributional alignment, (2) DINO-based perceptual similarity of cycle-generated images for visual grounding, and (3) LLM Text Embeddings for contextual text similarity against human references. A calibrated fusion of these signals allows RS to offer a more holistic assessment. On the Flickr8k benchmark, RS achieves a Kendall- of 58.42, outperforming most prior methods and demonstrating superior correlation with human judgments without requiring task-specific training. Our framework provides a more robust and nuanced evaluation by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
