Sampling and Filtering of Neural Machine Translation Distillation Data

Vil\'em Zouhar

arXiv:2104.00664·cs.CL·April 2, 2021

Sampling and Filtering of Neural Machine Translation Distillation Data

Vil\'em Zouhar

PDF

1 Repo

TL;DR

This paper investigates various sampling and filtering techniques for neural machine translation distillation data, demonstrating that strategic upsampling and data combination improve translation quality.

Contribution

It systematically explores sampling, pruning, and deduplication methods, showing their impact on MT distillation performance with empirical results.

Findings

01

Upsampling and combining data improve translation quality.

02

Careful data filtering outperforms naive data mixing.

03

Method enhances distillation effectiveness in MT models.

Abstract

In most of neural machine translation distillation or stealing scenarios, the goal is to preserve the performance of the target model (teacher). The highest-scoring hypothesis of the teacher model is commonly used to train a new model (student). If reference translations are also available, then better hypotheses (with respect to the references) can be upsampled and poor hypotheses either removed or undersampled. This paper explores the importance sampling method landscape (pruning, hypothesis upsampling and undersampling, deduplication and their combination) with English to Czech and English to German MT models using standard MT evaluation metrics. We show that careful upsampling and combination with the original data leads to better performance when compared to training only on the original or synthesized data or their direct combination.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zouharvi/reference-mt-distill
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.