The IWSLT 2021 BUT Speech Translation Systems

Hari Krishna Vydana; Martin Karafi'at; Luk'as Burget; "Honza"; Cernock'y

arXiv:2107.06155·cs.CL·July 14, 2021

The IWSLT 2021 BUT Speech Translation Systems

Hari Krishna Vydana, Martin Karafi'at, Luk'as Burget, "Honza", Cernock'y

PDF

TL;DR

This paper presents BUT's joint speech recognition and translation systems for English-German translation, emphasizing the benefits of large-scale pre-training and integrated models for improved translation quality.

Contribution

It introduces a joint ASR-MT training approach utilizing internal representations and large text-only data, enhancing speech translation performance.

Findings

01

Joint training improves translation accuracy.

02

Using punctuated ASR outputs enhances translation quality.

03

Pre-training on large datasets benefits end-to-end speech translation.

Abstract

The paper describes BUT's English to German offline speech translation(ST) systems developed for IWSLT2021. They are based on jointly trained Automatic Speech Recognition-Machine Translation models. Their performances is evaluated on MustC-Common test set. In this work, we study their efficiency from the perspective of having a large amount of separate ASR training data and MT training data, and a smaller amount of speech-translation training data. Large amounts of ASR and MT training data are utilized for pre-training the ASR and MT models. Speech-translation data is used to jointly optimize ASR-MT models by defining an end-to-end differentiable path from speech to translations. For this purpose, we use the internal continuous representations from the ASR-decoder as the input to MT module. We show that speech translation can be further improved by training the ASR-decoder jointly with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.