Automatic Correction of Human Translations

Jessy Lin; Geza Kovacs; Aditya Shastry; Joern Wuebker; John DeNero

arXiv:2206.08593·cs.CL·June 20, 2022

Automatic Correction of Human Translations

Jessy Lin, Geza Kovacs, Aditya Shastry, Joern Wuebker, John DeNero

PDF

Open Access 1 Repo

TL;DR

This paper introduces translation error correction (TEC), a new task focused on automatically fixing human translation errors, and demonstrates its effectiveness through a new dataset, model improvements, and positive human-in-the-loop evaluations.

Contribution

It presents the Aced corpus for TEC, highlights the distinct nature of human errors requiring specialized models, and shows that synthetic error pre-training and TEC assistance improve translation quality.

Findings

01

Human errors are more diverse and less fluent than machine errors.

02

Pre-training on synthetic human errors boosts TEC performance by up to 5.1 F-score points.

03

TEC assistance significantly improves translation quality in human evaluations.

Abstract

We introduce translation error correction (TEC), the task of automatically correcting human-generated translations. Imperfections in machine translations (MT) have long motivated systems for improving translations post-hoc with automatic post-editing. In contrast, little attention has been devoted to the problem of automatically correcting human translations, despite the intuition that humans make distinct errors that machines would be well-suited to assist with, from typos to inconsistencies in translation conventions. To investigate this, we build and release the Aced corpus with three TEC datasets. We show that human errors in TEC exhibit a more diverse range of errors and far fewer translation fluency errors than the MT errors in automatic post-editing datasets, suggesting the need for dedicated TEC models that are specialized to correct human errors. We show that pre-training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lilt/tec
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification