Audience-specific Explanations for Machine Translation
Renhan Lou, Jan Niehues

TL;DR
This paper introduces a semi-automatic method to extract explanations for culturally sensitive words in machine translation, improving the creation of explanation datasets across multiple language pairs.
Contribution
It proposes a novel semi-automatic technique to extract explanations from large parallel corpora, addressing data sparsity issues in creating explanation datasets for machine translation.
Findings
Over 10% of sentences contain explanations after extraction
Original sentences with explanations are only 1.9%
Method is robust across English-German, English-French, and English-Chinese pairs
Abstract
In machine translation, a common problem is that the translation of certain words even if translated can cause incomprehension of the target language audience due to different cultural backgrounds. A solution to solve this problem is to add explanations for these words. In a first step, we therefore need to identify these words or phrases. In this work we explore techniques to extract example explanations from a parallel corpus. However, the sparsity of sentences containing words that need to be explained makes building the training dataset extremely difficult. In this work, we propose a semi-automatic technique to extract these explanations from a large parallel corpus. Experiments on English->German language pair show that our method is able to extract sentence so that more than 10% of the sentences contain explanation, while only 1.9% of the original sentences contain explanations.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
