Automatic Post-Editing for Vietnamese

Thanh Vu; Dai Quoc Nguyen

arXiv:2104.12128·cs.CL·November 16, 2021

Automatic Post-Editing for Vietnamese

Thanh Vu, Dai Quoc Nguyen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a large-scale Vietnamese APE dataset and demonstrates neural MT models effectively improve translation quality through automatic and human evaluations.

Contribution

It presents the first large-scale Vietnamese APE dataset and applies neural MT models to enhance post-editing accuracy.

Findings

01

Neural MT models significantly improve Vietnamese translation quality.

02

The dataset contains 5 million sentence pairs for training.

03

Both automatic and human evaluations confirm the effectiveness.

Abstract

Automatic post-editing (APE) is an important remedy for reducing errors of raw translated texts that are produced by machine translation (MT) systems or software-aided translation. In this paper, we present a systematic approach to tackle the APE task for Vietnamese. Specifically, we construct the first large-scale dataset of 5M Vietnamese translated and corrected sentence pairs. We then apply strong neural MT models to handle the APE task, using our constructed dataset. Experimental results from both automatic and human evaluations show the effectiveness of the neural MT models in handling the Vietnamese APE task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tienthanhdhcn/VnAPE
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications