Efficient yet Competitive Speech Translation: FBK@IWSLT2022
Marco Gaido, Sara Papi, Dennis Fucci, Giuseppe Fiameni, Matteo Negri,, Marco Turchi

TL;DR
This paper presents a cost-effective speech translation system that eliminates the need for ASR pre-training, employs simple data filtering, and addresses audio segmentation issues, achieving competitive results in offline and simultaneous tasks.
Contribution
It demonstrates that ASR pre-training is unnecessary for competitive speech translation, introduces a simple data filtering method, and compares strategies to handle segmentation mismatch, reducing training costs.
Findings
Achieved 26.7 BLEU on MuST-C en-de corpus.
Improved IWSLT2020 test BLEU by 1.6 over previous best.
Validated lightweight training strategy effectiveness.
Abstract
The primary goal of this FBK's systems submission to the IWSLT 2022 offline and simultaneous speech translation tasks is to reduce model training costs without sacrificing translation quality. As such, we first question the need of ASR pre-training, showing that it is not essential to achieve competitive results. Second, we focus on data filtering, showing that a simple method that looks at the ratio between source and target characters yields a quality improvement of 1 BLEU. Third, we compare different methods to reduce the detrimental effect of the audio segmentation mismatch between training data manually segmented at sentence level and inference data that is automatically segmented. Towards the same goal of training cost reduction, we participate in the simultaneous task with the same model trained for offline ST. The effectiveness of our lightweight training strategy is shown by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
