Evaluating Perspectival Biases in Cross-Modal Retrieval

Teerapol Saengsukhiran; Peerawat Chomphooyod; Narabodee Rodjananant; Chompakorn Chaksangchaichot; Patawee Prakrankamanant; Witthawin Sripheanpol; Pak Lovichit; Sarana Nutanong; Ekapol Chuangsuwanich

arXiv:2510.26861·cs.IR·January 27, 2026

Evaluating Perspectival Biases in Cross-Modal Retrieval

Teerapol Saengsukhiran, Peerawat Chomphooyod, Narabodee Rodjananant, Chompakorn Chaksangchaichot, Patawee Prakrankamanant, Witthawin Sripheanpol, Pak Lovichit, Sarana Nutanong, Ekapol Chuangsuwanich

PDF

Open Access

TL;DR

This paper investigates how cultural and linguistic biases influence multimodal retrieval systems, revealing systematic biases and proposing the need for strategies to decouple language from culture for fairer retrieval outcomes.

Contribution

Introduces the 3XCM benchmark to isolate cultural and linguistic biases and analyzes their impact on cross-modal retrieval performance.

Findings

01

Models favor prevalent languages over semantically faithful entries.

02

A 'tugging effect' exists between semantic alignment and cultural association.

03

Biases are more pronounced in low-resource languages with insufficient semantic resolution.

Abstract

Multimodal retrieval systems are expected to operate in a semantic space, agnostic to the language or cultural origin of the query. In practice, however, retrieval outcomes systematically reflect perspectival biases: deviations shaped by linguistic prevalence and cultural associations. We introduce the Cross-Cultural, Cross-Modal, Cross-lingual Multimodal (3XCM) benchmark to isolate these effects. Results from our studies indicate that, for image-to-text retrieval, models tend to favor entries from prevalent languages over those that are semantically faithful. For text-to-image retrieval, we observe a consistent "tugging effect" in the joint embedding space between semantic alignment and language-conditioned cultural association. When semantic representations are insufficiently resolved, particularly in low-resource languages, similarity is increasingly governed by culturally familiar…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Multimodal Machine Learning Applications · Categorization, perception, and language