CAIRe: Cultural Attribution of Images by Retrieval-Augmented Evaluation
Arnav Yayavaram, Siddharth Yayavaram, Simran Khanuja, Michael Saxon, Graham Neubig

TL;DR
CAIRe is a novel evaluation metric that measures the cultural relevance of images across diverse contexts, addressing biases in text-to-image models with strong alignment to human judgment.
Contribution
We introduce CAIRe, a new metric for evaluating cultural relevance in images, grounded in knowledge bases and factual judgments, improving bias measurement in text-to-image models.
Findings
CAIRe outperforms baselines by 22% F1 on a curated dataset.
Achieves Pearson's correlation of 0.56 and 0.66 with human ratings.
Demonstrates strong alignment with human cultural relevance judgments.
Abstract
As text-to-image models become increasingly prevalent, ensuring their equitable performance across diverse cultural contexts is critical. Efforts to mitigate cross-cultural biases have been hampered by trade-offs, including a loss in performance, factual inaccuracies, or offensive outputs. Despite widespread recognition of these challenges, an inability to reliably measure these biases has stalled progress. To address this gap, we introduce CAIRe, an evaluation metric that assesses the degree of cultural relevance of an image, given a user-defined set of labels. Our framework grounds entities and concepts in the image to a knowledge base and uses factual information to give independent graded judgments for each culture label. On a manually curated dataset of culturally salient but rare items built using language models, CAIRe surpasses all baselines by 22% F1 points. Additionally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Misinformation and Its Impacts · Multimodal Machine Learning Applications
