When More Words Say Less: Decoupling Length and Specificity in Image Description Evaluation

Rhea Kapur; Robert Hawkins; Elisa Kreiss

arXiv:2601.04609·cs.CL·April 21, 2026

When More Words Say Less: Decoupling Length and Specificity in Image Description Evaluation

Rhea Kapur, Robert Hawkins, Elisa Kreiss

PDF

TL;DR

This paper emphasizes the importance of disentangling description length from specificity in image description evaluation, proposing a new approach that prioritizes specificity over verbosity.

Contribution

It introduces a new dataset controlling for length and varying information content, demonstrating that specificity is not solely determined by length and should be directly evaluated.

Findings

01

People prefer more specific descriptions regardless of length.

02

Controlling for length alone does not explain differences in specificity.

03

Evaluation methods should prioritize specificity over verbosity.

Abstract

Vision-language models (VLMs) are increasingly used to make visual content accessible via text-based descriptions. In current systems, however, description specificity is often conflated with their length. We argue that these two concepts must be disentangled: descriptions can be concise yet dense with information, or lengthy yet vacuous. We define specificity relative to a contrast set, where a description is more specific to the extent that it picks out the target image better than other possible images. We construct a dataset that controls for length while varying information content, and validate that people reliably prefer more specific descriptions regardless of length. We find that controlling for length alone cannot account for differences in specificity: how the length budget is allocated makes a difference. These results support evaluation approaches that directly prioritize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.