Levenshtein Training for Word-level Quality Estimation

Shuoyang Ding; Marcin Junczys-Dowmunt; Matt Post; Philipp Koehn

arXiv:2109.05611·cs.CL·September 17, 2021

Levenshtein Training for Word-level Quality Estimation

Shuoyang Ding, Marcin Junczys-Dowmunt, Matt Post, Philipp Koehn

PDF

1 Repo

TL;DR

This paper introduces a Levenshtein Transformer-based approach for word-level quality estimation in translation, leveraging iterative decoding and transfer learning to improve data efficiency and performance.

Contribution

It presents a novel Levenshtein Transformer method with a two-stage transfer learning scheme for word-level quality estimation, addressing data efficiency and compatibility issues.

Findings

01

Superior data efficiency in constrained data scenarios

02

Competitive performance in unconstrained settings

03

Effective use of transfer learning and heuristics for label construction

Abstract

We propose a novel scheme to use the Levenshtein Transformer to perform the task of word-level quality estimation. A Levenshtein Transformer is a natural fit for this task: trained to perform decoding in an iterative manner, a Levenshtein Transformer can learn to post-edit without explicit supervision. To further minimize the mismatch between the translation task and the word-level QE task, we propose a two-stage transfer learning procedure on both augmented data and human post-editing data. We also propose heuristics to construct reference labels that are compatible with subword-level finetuning and inference. Results on WMT 2020 QE shared task dataset show that our proposed method has superior data efficiency under the data-constrained setting and competitive performance under the unconstrained setting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shuoyangd/stenella
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Dense Connections · Label Smoothing · Residual Connection · Adam