TL;DR
This paper introduces a Levenshtein Transformer-based approach for word-level quality estimation in translation, leveraging iterative decoding and transfer learning to improve data efficiency and performance.
Contribution
It presents a novel Levenshtein Transformer method with a two-stage transfer learning scheme for word-level quality estimation, addressing data efficiency and compatibility issues.
Findings
Superior data efficiency in constrained data scenarios
Competitive performance in unconstrained settings
Effective use of transfer learning and heuristics for label construction
Abstract
We propose a novel scheme to use the Levenshtein Transformer to perform the task of word-level quality estimation. A Levenshtein Transformer is a natural fit for this task: trained to perform decoding in an iterative manner, a Levenshtein Transformer can learn to post-edit without explicit supervision. To further minimize the mismatch between the translation task and the word-level QE task, we propose a two-stage transfer learning procedure on both augmented data and human post-editing data. We also propose heuristics to construct reference labels that are compatible with subword-level finetuning and inference. Results on WMT 2020 QE shared task dataset show that our proposed method has superior data efficiency under the data-constrained setting and competitive performance under the unconstrained setting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Dense Connections · Label Smoothing · Residual Connection · Adam
