How Does Distilled Data Complexity Impact the Quality and Confidence of   Non-Autoregressive Machine Translation?

Weijia Xu; Shuming Ma; Dongdong Zhang; Marine Carpuat

arXiv:2105.12900·cs.CL·May 28, 2021

How Does Distilled Data Complexity Impact the Quality and Confidence of Non-Autoregressive Machine Translation?

Weijia Xu, Shuming Ma, Dongdong Zhang, Marine Carpuat

PDF

Open Access

TL;DR

This paper investigates how the complexity of distilled training data affects the performance and confidence calibration of non-autoregressive machine translation models, revealing that lexical diversity reduction is key to improving quality and confidence.

Contribution

It provides a detailed analysis of how different complexity aspects of distilled data influence NAR translation quality and confidence calibration, highlighting lexical diversity as a crucial factor.

Findings

01

Reducing lexical diversity improves NAR translation quality.

02

Decreasing reordering complexity enhances alignment learning.

03

Lexical diversity reduction mainly boosts model confidence.

Abstract

While non-autoregressive (NAR) models are showing great promise for machine translation, their use is limited by their dependence on knowledge distillation from autoregressive models. To address this issue, we seek to understand why distillation is so effective. Prior work suggests that distilled training data is less complex than manual translations. Based on experiments with the Levenshtein Transformer and the Mask-Predict NAR models on the WMT14 German-English task, this paper shows that different types of complexity have different impacts: while reducing lexical diversity and decreasing reordering complexity both help NAR learn better alignment between source and target, and thus improve translation quality, lexical diversity is the main reason why distillation increases model confidence, which affects the calibration of different NAR models differently.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Knowledge Distillation · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Dense Connections · Adam · Label Smoothing