Unveiling the Role of Pretraining in Direct Speech Translation

Belen Alastruey; Gerard I. G\'allego; Marta R. Costa-juss\`a

arXiv:2409.18044·cs.CL·September 27, 2024

Unveiling the Role of Pretraining in Direct Speech Translation

Belen Alastruey, Gerard I. G\'allego, Marta R. Costa-juss\`a

PDF

Open Access 1 Video

TL;DR

This paper investigates the impact of pretraining on direct speech translation, revealing training challenges and proposing a decoder modification to enable training from scratch with comparable performance and reduced training time.

Contribution

The study identifies training difficulties in direct speech translation and introduces a decoder change that allows training from scratch effectively, reducing reliance on pretraining.

Findings

01

Pretrained encoders facilitate learning in speech translation.

02

A decoder modification enables training from scratch effectively.

03

Training from scratch can match pretrained performance with less time.

Abstract

Direct speech-to-text translation systems encounter an important drawback in data scarcity. A common solution consists on pretraining the encoder on automatic speech recognition, hence losing efficiency in the training process. In this study, we compare the training dynamics of a system using a pretrained encoder, the conventional approach, and one trained from scratch. We observe that, throughout the training, the randomly initialized model struggles to incorporate information from the speech inputs for its predictions. Hence, we hypothesize that this issue stems from the difficulty of effectively training an encoder for direct speech translation. While a model trained from scratch needs to learn acoustic and semantic modeling simultaneously, a pretrained one can just focus on the latter. Based on these findings, we propose a subtle change in the decoder cross-attention to integrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Unveiling the Role of Pretraining in Direct Speech Translation· underline

Taxonomy

TopicsSubtitles and Audiovisual Media

MethodsFocus