Evaluating Remote Sensing Image Captions Beyond Metric Biases

Ziyun Chen; Fan Liu; Liang Yao; Chuanyi Zhang; Yuye Ma; Wei Zhou

arXiv:2604.22855·cs.CV·April 28, 2026

Evaluating Remote Sensing Image Captions Beyond Metric Biases

Ziyun Chen, Fan Liu, Liang Yao, Chuanyi Zhang, Yuye Ma, Wei Zhou

PDF

1 Repo

TL;DR

This paper introduces ReconScore, a reference-free evaluation metric for remote sensing image captioning, revealing that powerful unfine-tuned models outperform fine-tuned ones in zero-shot tasks, and proposes a training-free captioning method called RemoteDescriber.

Contribution

The paper presents ReconScore, a novel evaluation metric that reduces bias, and introduces RemoteDescriber, a training-free captioning approach leveraging ReconScore for self-correction.

Findings

01

Unfined models outperform fine-tuned models in zero-shot RSIC tasks.

02

ReconScore effectively evaluates caption quality without reference texts.

03

RemoteDescriber achieves state-of-the-art results on multiple datasets.

Abstract

The core objective of image captioning is to achieve lossless semantic compression from visual signals into textual modalities. However, the reliance on manually curated reference texts for evaluation essentially forces models to mimic specific human annotation styles, thereby masking the true descriptive capabilities of advanced foundation models. This systemic misalignment prompts a critical question: Is task-specific fine-tuning truly necessary for Remote Sensing Image Captioning, or is the perceived performance gap merely an artifact of flawed evaluation criteria? To investigate this discrepancy, we propose ReconScore, a novel reference-free evaluation metric. Rather than computing textual similarities, we assess caption quality by its capability to reconstruct the original visual elements solely from the generated text, effectively neutralizing human annotation biases. Applying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hhu-czy/RemoteDescriber
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.