Automated Postediting of Documents

Kevin Knight (USC/Information Sciences Institute); Ishwar Chander; (USC/Information Sciences Institute)

arXiv:cmp-lg/9407028·cmp-lg·February 3, 2008·168 cites

Automated Postediting of Documents

Kevin Knight (USC/Information Sciences Institute), Ishwar Chander, (USC/Information Sciences Institute)

PDF

Open Access

TL;DR

This paper presents an automated postediting system for improving low- to medium-quality English texts, focusing on a portable module for article selection in machine translation, with promising results compared to human performance.

Contribution

It introduces a portable, rule-based postediting module for English article correction in MT outputs, derived from large online text resources.

Findings

01

Over 200,000 rules automatically derived for article correction

02

System achieves high accuracy in article selection tasks

03

Performance compares favorably with human posteditors

Abstract

Large amounts of low- to medium-quality English texts are now being produced by machine translation (MT) systems, optical character readers (OCR), and non-native speakers of English. Most of this text must be postedited by hand before it sees the light of day. Improving text quality is tedious work, but its automation has not received much research attention. Anyone who has postedited a technical report or thesis written by a non-native speaker of English knows the potential of an automated postediting system. For the case of MT-generated text, we argue for the construction of postediting modules that are portable across MT systems, as an alternative to hardcoding improvements inside any one system. As an example, we have built a complete self-contained postediting module for the task of article selection (a, an, the) for English noun phrases. This is a notoriously difficult problem for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies