Automatic Correction of Human Translations
Jessy Lin, Geza Kovacs, Aditya Shastry, Joern Wuebker, John DeNero

TL;DR
This paper introduces translation error correction (TEC), a new task focused on automatically fixing human translation errors, and demonstrates its effectiveness through a new dataset, model improvements, and positive human-in-the-loop evaluations.
Contribution
It presents the Aced corpus for TEC, highlights the distinct nature of human errors requiring specialized models, and shows that synthetic error pre-training and TEC assistance improve translation quality.
Findings
Human errors are more diverse and less fluent than machine errors.
Pre-training on synthetic human errors boosts TEC performance by up to 5.1 F-score points.
TEC assistance significantly improves translation quality in human evaluations.
Abstract
We introduce translation error correction (TEC), the task of automatically correcting human-generated translations. Imperfections in machine translations (MT) have long motivated systems for improving translations post-hoc with automatic post-editing. In contrast, little attention has been devoted to the problem of automatically correcting human translations, despite the intuition that humans make distinct errors that machines would be well-suited to assist with, from typos to inconsistencies in translation conventions. To investigate this, we build and release the Aced corpus with three TEC datasets. We show that human errors in TEC exhibit a more diverse range of errors and far fewer translation fluency errors than the MT errors in automatic post-editing datasets, suggesting the need for dedicated TEC models that are specialized to correct human errors. We show that pre-training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
