Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models

Li-Zhong Szu-Tu; Ting-Lin Wu; Chia-Jui Chang; He Syu; Yu-Lun Liu

arXiv:2512.21337·cs.CV·December 25, 2025

Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models

Li-Zhong Szu-Tu, Ting-Lin Wu, Chia-Jui Chang, He Syu, Yu-Lun Liu

PDF

Open Access 1 Models 1 Datasets

TL;DR

This paper reveals a popularity bias in vision-language models by introducing a large multi-modal dataset and benchmark, demonstrating models' reliance on memorization rather than understanding, especially on less popular items.

Contribution

The paper introduces the YearGuessr dataset and a new benchmark for evaluating popularity bias in vision-language models, along with popularity-aware metrics and a new model, YearCLIP.

Findings

01

Models perform up to 34% better on popular buildings.

02

VLMs struggle with less recognized, less popular items.

03

Benchmark reveals reliance on memorization over understanding.

Abstract

We expose a significant popularity bias in state-of-the-art vision-language models (VLMs), which achieve up to 34% higher accuracy on famous buildings compared to ordinary ones, indicating a reliance on memorization over generalizable understanding. To systematically investigate this, we introduce the largest open benchmark for this task: the YearGuessr dataset, a collection of 55,546 building images with multi-modal attributes from 157 countries, annotated with continuous ordinal labels of their construction year (1001-2024), GPS data, and page-view counts as a proxy for popularity. Using this dataset, we frame the construction year prediction task as ordinal regression and introduce popularity-aware interval accuracy metrics to quantify this bias. Our resulting benchmark of 30+ models, including our YearCLIP model, confirms that VLMs excel on popular, memorized items but struggle…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Morris0401/YearCLIP
model

Datasets

Morris0401/Year-Guessr-Dataset
dataset· 2.2k dl
2.2k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Sentiment Analysis and Opinion Mining