Redemption Score: A Multi-Modal Evaluation Framework for Image Captioning via Distributional, Perceptual, and Linguistic Signal Triangulation

Ashim Dahal; Ankit Ghimire; Saydul Akbar Murad; Nick Rahimi

arXiv:2505.16180·cs.CV·September 25, 2025

Redemption Score: A Multi-Modal Evaluation Framework for Image Captioning via Distributional, Perceptual, and Linguistic Signal Triangulation

Ashim Dahal, Ankit Ghimire, Saydul Akbar Murad, Nick Rahimi

PDF

TL;DR

Redemption Score (RS) is a comprehensive evaluation framework for image captioning that combines distributional, perceptual, and linguistic signals to better align with human judgment.

Contribution

This paper introduces Redemption Score, a hybrid evaluation method that triangulates multiple signals for more holistic image caption assessment without task-specific training.

Findings

01

RS outperforms prior metrics on Flickr8k with a Kendall-τ of 58.42

02

RS correlates better with human judgments across multiple datasets

03

The framework offers a robust evaluation of visual accuracy and text quality

Abstract

Evaluating image captions requires cohesive assessment of both visual semantics and language pragmatics, which is often not entirely captured by most metrics. We introduce Redemption Score(RS), a novel hybrid framework that ranks image captions by triangulating three complementary signals: (1) Mutual Information Divergence (MID) for global image-text distributional alignment, (2) DINO-based perceptual similarity of cycle-generated images for visual grounding, and (3) LLM Text Embeddings for contextual text similarity against human references. A calibrated fusion of these signals allows RS to offer a more holistic assessment. On the Flickr8k benchmark, RS achieves a Kendall- $τ$ of 58.42, outperforming most prior methods and demonstrating superior correlation with human judgments without requiring task-specific training. Our framework provides a more robust and nuanced evaluation by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.