TL;DR
This paper investigates various knowledge distillation methods for direct speech translation, analyzing their effectiveness, potential drawbacks, and ways to improve translation quality in sequence-to-sequence models.
Contribution
It provides a comparative analysis of knowledge distillation techniques for speech translation and explores solutions to mitigate associated drawbacks.
Findings
Distillation improves translation quality in speech translation tasks.
Certain distillation methods have notable drawbacks that can be alleviated.
Maintaining benefits while reducing drawbacks enhances model performance.
Abstract
Direct speech translation (ST) has shown to be a complex task requiring knowledge transfer from its sub-tasks: automatic speech recognition (ASR) and machine translation (MT). For MT, one of the most promising techniques to transfer knowledge is knowledge distillation. In this paper, we compare the different solutions to distill knowledge in a sequence-to-sequence task like ST. Moreover, we analyze eventual drawbacks of this approach and how to alleviate them maintaining the benefits in terms of translation quality.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
