Beyond Noise: Mitigating the Impact of Fine-grained Semantic Divergences   on Neural Machine Translation

Eleftheria Briakou; Marine Carpuat

arXiv:2105.15087·cs.CL·June 1, 2021

Beyond Noise: Mitigating the Impact of Fine-grained Semantic Divergences on Neural Machine Translation

Eleftheria Briakou, Marine Carpuat

PDF

2 Repos

TL;DR

This paper investigates how fine-grained semantic divergences in training data affect neural machine translation, revealing their negative impact and proposing a divergence-aware framework to improve translation quality and confidence.

Contribution

It introduces a novel divergence-aware NMT framework that mitigates the effects of semantic divergences, enhancing translation accuracy and model calibration.

Findings

01

Models trained on synthetic divergences produce more degenerated text.

02

Divergence-aware NMT improves translation quality on EN-FR tasks.

03

The framework enhances model confidence and calibration.

Abstract

While it has been shown that Neural Machine Translation (NMT) is highly sensitive to noisy parallel training samples, prior work treats all types of mismatches between source and target as noise. As a result, it remains unclear how samples that are mostly equivalent but contain a small number of semantically divergent tokens impact NMT training. To close this gap, we analyze the impact of different types of fine-grained semantic divergences on Transformer models. We show that models trained on synthetic divergences output degenerated text more frequently and are less confident in their predictions. Based on these findings, we introduce a divergent-aware NMT framework that uses factors to help NMT recover from the degradation caused by naturally occurring divergences, improving both translation quality and model calibration on EN-FR tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Byte Pair Encoding · Residual Connection · Dropout