Prototypicality Bias Reveals Blindspots in Multimodal Evaluation Metrics
Subhadeep Roy, Gagan Bhatia, Steffen Eger

TL;DR
This paper uncovers a bias in multimodal evaluation metrics where they favor prototypical images over semantically correct but non-prototypical ones, proposing a new metric to improve robustness.
Contribution
The authors introduce ProtoBias, a benchmark for evaluating prototypicality bias, and propose ProtoScore, a more robust metric that reduces failure rates and improves semantic evaluation.
Findings
Widely used metrics often misrank prototypical adversarial images.
Human judgments favor semantic correctness over prototypes.
ProtoScore significantly outperforms existing metrics in robustness.
Abstract
Automatic metrics are now central to evaluating text-to-image models, often substituting for human judgment in benchmarking and large-scale filtering. However, it remains unclear whether these metrics truly prioritize semantic correctness or instead favor visually and socially prototypical images learned from biased data distributions. We identify and study prototypicality bias as a systematic failure mode in multimodal evaluation. We introduce a controlled contrastive benchmark ProtoBias (Prototypical Bias), spanning Animals, Objects, and Demography images, where semantically correct but non-prototypical images are paired with subtly incorrect yet prototypical adversarial counterparts. This setup enables a directional evaluation of whether metrics follow textual semantics or default to prototypes. Our results show that widely used metrics, including CLIPScore, PickScore, and VQA-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education
