Moral Semantics Survive Machine Translation: Cross-Lingual Evidence from Moral Foundations Corpora
Maciej Skorski

TL;DR
This study shows that machine translation can effectively transfer moral language understanding from English to Polish, enabling cross-lingual moral values research despite cultural and linguistic differences.
Contribution
It demonstrates that LLM-based translation preserves subtle moral cues across languages, facilitating moral semantics research in under-resourced languages like Polish.
Findings
Cross-lingual embedding similarity of 0.86 indicates strong moral cue preservation.
Minimal AUC gaps (0.01-0.02) show classifier parity across languages.
Translation quality improves with fine-tuning of language models.
Abstract
Moral language is subtle and culturally variable, making it difficult to translate faithfully across languages. Idiomatic expressions, slang, and cultural references introduce hard-to-avoid translation artifacts. Yet automated moral values classification depends on language-specific annotated corpora that exist almost exclusively in English. We investigate whether LLM-based translation can bridge this gap, taking Polish as a test case. Using 50k morally-annotated social media posts from a diverse range of topics, we apply a principled four-method validation pipeline: LaBSE cross-lingual embedding similarity, Centered Kernel Alignment (CKA), LLM-as-judge evaluation, and deep learning classifier parity tests. We show that despite shortcomings in handling slang, vulgarity, and culturally-loaded expressions, direct translation preserves subtle moral cues well enough to be harvested by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
