On Knowledge Distillation for Direct Speech Translation

Marco Gaido; Mattia A. Di Gangi; Matteo Negri; Marco Turchi

arXiv:2012.04964·cs.CL·December 10, 2020

On Knowledge Distillation for Direct Speech Translation

Marco Gaido, Mattia A. Di Gangi, Matteo Negri, Marco Turchi

PDF

1 Repo

TL;DR

This paper investigates various knowledge distillation methods for direct speech translation, analyzing their effectiveness, potential drawbacks, and ways to improve translation quality in sequence-to-sequence models.

Contribution

It provides a comparative analysis of knowledge distillation techniques for speech translation and explores solutions to mitigate associated drawbacks.

Findings

01

Distillation improves translation quality in speech translation tasks.

02

Certain distillation methods have notable drawbacks that can be alleviated.

03

Maintaining benefits while reducing drawbacks enhances model performance.

Abstract

Direct speech translation (ST) has shown to be a complex task requiring knowledge transfer from its sub-tasks: automatic speech recognition (ASR) and machine translation (MT). For MT, one of the most promising techniques to transfer knowledge is knowledge distillation. In this paper, we compare the different solutions to distill knowledge in a sequence-to-sequence task like ST. Moreover, we analyze eventual drawbacks of this approach and how to alleviate them maintaining the benefits in terms of translation quality.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mgaido91/FBK-fairseq-ST
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.