Benchmarking Machine Translation with Cultural Awareness

Binwei Yao; Ming Jiang; Tara Bobinac; Diyi Yang; Junjie Hu

arXiv:2305.14328·cs.CL·October 22, 2024·20 cites

Benchmarking Machine Translation with Cultural Awareness

Binwei Yao, Ming Jiang, Tara Bobinac, Diyi Yang, Junjie Hu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a new culturally-annotated parallel corpus and evaluation metrics to assess the cultural awareness of machine translation systems, highlighting LLMs' superior performance in translating culture-specific items.

Contribution

It presents a novel CSI-enriched corpus and evaluation metrics for Culturally-Aware Machine Translation, enabling better analysis of how different MT systems handle cultural content.

Findings

01

LLMs outperform neural MT in translating CSIs.

02

LLMs better leverage external cultural knowledge.

03

The corpus facilitates future research on cultural aspects in MT.

Abstract

Translating culture-related content is vital for effective cross-cultural communication. However, many culture-specific items (CSIs) often lack viable translations across languages, making it challenging to collect high-quality, diverse parallel corpora with CSI annotations. This difficulty hinders the analysis of cultural awareness of machine translation (MT) systems, including traditional neural MT and the emerging MT paradigm using large language models (LLM). To address this gap, we introduce a novel parallel corpus, enriched with CSI annotations in 6 language pairs for investigating Culturally-Aware Machine Translation--CAMT. Furthermore, we design two evaluation metrics to assess CSI translations, focusing on their pragmatic translation quality. Our findings show the superior ability of LLMs over neural MTs in leveraging external cultural knowledge for translating CSIs, especially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BigBinnie/Benchmarking-LLM-based-Machine-Translation-on-Cultural-Awareness
none

Videos

Benchmarking Machine Translation with Cultural Awareness· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Multi-Head Attention · Softmax · Dropout