Knowledge Distillation for Quality Estimation

Amit Gajbhiye; Marina Fomicheva; Fernando Alva-Manchego; Fr\'ed\'eric; Blain; Abiola Obamuyide; Nikolaos Aletras; Lucia Specia

arXiv:2107.00411·cs.CL·July 2, 2021

Knowledge Distillation for Quality Estimation

Amit Gajbhiye, Marina Fomicheva, Fernando Alva-Manchego, Fr\'ed\'eric, Blain, Abiola Obamuyide, Nikolaos Aletras, Lucia Specia

PDF

1 Repo

TL;DR

This paper introduces a knowledge distillation approach for Quality Estimation in machine translation, creating smaller, efficient models that perform competitively without relying on large pre-trained models.

Contribution

It proposes directly transferring knowledge from a strong QE teacher model to a smaller, shallower model, reducing size while maintaining performance.

Findings

01

Smaller models achieve competitive QE performance.

02

Data augmentation enhances the effectiveness of knowledge transfer.

03

The approach reduces model size by 8x compared to large pre-trained models.

Abstract

Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online social media conversations. Recent success in QE stems from the use of multilingual pre-trained representations, where very large models lead to impressive results. However, the inference time, disk and memory requirements of such models do not allow for wide usage in the real world. Models trained on distilled pre-trained representations remain prohibitively large for many usage scenarios. We instead propose to directly transfer knowledge from a strong QE teacher model to a much smaller model with a different, shallower architecture. We show that this approach, in combination with data augmentation, leads to light-weight QE models that perform competitively with distilled pre-trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sheffieldnlp/deepQuest-py
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.