Dealing with training and test segmentation mismatch: FBK@IWSLT2021
Sara Papi, Marco Gaido, Matteo Negri, Marco Turchi

TL;DR
This paper presents a speech translation system that uses a hybrid segmentation approach and fine-tuning strategies to mitigate performance drops caused by segmentation mismatches between training and testing data.
Contribution
It introduces a novel hybrid segmentation method and a two-step fine-tuning process to improve speech translation accuracy under realistic, automatically segmented test conditions.
Findings
Hybrid segmentation reduces BLEU gap from 8.3 to 1.4 points.
Two-step fine-tuning improves model robustness to segmentation mismatch.
Knowledge distillation enhances translation performance.
Abstract
This paper describes FBK's system submission to the IWSLT 2021 Offline Speech Translation task. We participated with a direct model, which is a Transformer-based architecture trained to translate English speech audio data into German texts. The training pipeline is characterized by knowledge distillation and a two-step fine-tuning procedure. Both knowledge distillation and the first fine-tuning step are carried out on manually segmented real and synthetic data, the latter being generated with an MT system trained on the available corpora. Differently, the second fine-tuning step is carried out on a random segmentation of the MuST-C v2 En-De dataset. Its main goal is to reduce the performance drops occurring when a speech translation model trained on manually segmented data (i.e. an ideal, sentence-like segmentation) is evaluated on automatically segmented audio (i.e. actual, more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKnowledge Distillation
