Automated Postediting of Documents
Kevin Knight (USC/Information Sciences Institute), Ishwar Chander, (USC/Information Sciences Institute)

TL;DR
This paper presents an automated postediting system for improving low- to medium-quality English texts, focusing on a portable module for article selection in machine translation, with promising results compared to human performance.
Contribution
It introduces a portable, rule-based postediting module for English article correction in MT outputs, derived from large online text resources.
Findings
Over 200,000 rules automatically derived for article correction
System achieves high accuracy in article selection tasks
Performance compares favorably with human posteditors
Abstract
Large amounts of low- to medium-quality English texts are now being produced by machine translation (MT) systems, optical character readers (OCR), and non-native speakers of English. Most of this text must be postedited by hand before it sees the light of day. Improving text quality is tedious work, but its automation has not received much research attention. Anyone who has postedited a technical report or thesis written by a non-native speaker of English knows the potential of an automated postediting system. For the case of MT-generated text, we argue for the construction of postediting modules that are portable across MT systems, as an alternative to hardcoding improvements inside any one system. As an example, we have built a complete self-contained postediting module for the task of article selection (a, an, the) for English noun phrases. This is a notoriously difficult problem for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
