Semantic Label Drift in Cross-Cultural Translation
Mohsinul Kabir, Tasnim Ahmed, Md Mezbaur Rahman, Polydoros Giannouris, Sophia Ananiadou

TL;DR
This paper investigates how cultural differences cause semantic label drift in machine translation, revealing that modern LLMs and cultural divergence significantly impact label preservation in sensitive domains.
Contribution
It uncovers the influence of cultural factors on label drift in MT, especially with LLMs, and emphasizes the importance of cultural alignment for accurate translation.
Findings
LLMs induce label drift in culturally sensitive domains
Cultural knowledge in LLMs can amplify label drift
Cultural similarity affects label preservation
Abstract
Machine Translation (MT) is widely employed to address resource scarcity in low-resource languages by generating synthetic data from high-resource counterparts. While sentiment preservation in translation has long been studied, a critical but underexplored factor is the role of cultural alignment between source and target languages. In this paper, we hypothesize that semantic labels are drifted or altered during MT due to cultural divergence. Through a series of experiments across culturally sensitive and neutral domains, we establish three key findings: (1) MT systems, including modern Large Language Models (LLMs), induce label drift during translation, particularly in culturally sensitive domains; (2) unlike earlier statistical MT tools, LLMs encode cultural knowledge, and leveraging this knowledge can amplify label drift; and (3) cultural similarity or dissimilarity between source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
