Audience-specific Explanations for Machine Translation

Renhan Lou; Jan Niehues

arXiv:2309.12998·cs.CL·September 25, 2023·1 cites

Audience-specific Explanations for Machine Translation

Renhan Lou, Jan Niehues

PDF

Open Access

TL;DR

This paper introduces a semi-automatic method to extract explanations for culturally sensitive words in machine translation, improving the creation of explanation datasets across multiple language pairs.

Contribution

It proposes a novel semi-automatic technique to extract explanations from large parallel corpora, addressing data sparsity issues in creating explanation datasets for machine translation.

Findings

01

Over 10% of sentences contain explanations after extraction

02

Original sentences with explanations are only 1.9%

03

Method is robust across English-German, English-French, and English-Chinese pairs

Abstract

In machine translation, a common problem is that the translation of certain words even if translated can cause incomprehension of the target language audience due to different cultural backgrounds. A solution to solve this problem is to add explanations for these words. In a first step, we therefore need to identify these words or phrases. In this work we explore techniques to extract example explanations from a parallel corpus. However, the sparsity of sentences containing words that need to be explained makes building the training dataset extremely difficult. In this work, we propose a semi-automatic technique to extract these explanations from a large parallel corpus. Experiments on English->German language pair show that our method is able to extract sentence so that more than 10% of the sentences contain explanation, while only 1.9% of the original sentences contain explanations.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling