Prompting Large Language Models with Human Error Markings for   Self-Correcting Machine Translation

Nathaniel Berger; Stefan Riezler; Miriam Exel; Matthias Huck

arXiv:2406.02267·cs.CL·June 5, 2024

Prompting Large Language Models with Human Error Markings for Self-Correcting Machine Translation

Nathaniel Berger, Stefan Riezler, Miriam Exel, Matthias Huck

PDF

Open Access

TL;DR

This paper explores a two-step method where human error markings guide large language models to improve machine translation accuracy in technical domains, showing consistent benefits over automatic post-editing.

Contribution

It introduces a novel approach combining human error markings with retrieval-augmented prompting to enhance translation quality in specialized fields.

Findings

01

Human error markings improve LLM correction focus.

02

Guided prompting yields better translation consistency.

03

Method outperforms automatic post-editing and from-scratch MT.

Abstract

While large language models (LLMs) pre-trained on massive amounts of unpaired language data have reached the state-of-the-art in machine translation (MT) of general domain texts, post-editing (PE) is still required to correct errors and to enhance term translation quality in specialized domains. In this paper we present a pilot study of enhancing translation memories (TM) produced by PE (source segments, machine translations, and reference translations, henceforth called PE-TM) for the needs of correct and consistent term translation in technical domains. We investigate a light-weight two-step scenario where, at inference time, a human translator marks errors in the first translation step, and in a second step a few similar examples are extracted from the PE-TM to prompt an LLM. Our experiment shows that the additional effort of augmenting translations with human error markings guides…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsFocus