Efficient Speech Translation with Pre-trained Models

Zhaolin Li; Jan Niehues

arXiv:2211.04939·cs.CL·November 10, 2022

Efficient Speech Translation with Pre-trained Models

Zhaolin Li, Jan Niehues

PDF

Open Access

TL;DR

This paper explores efficient speech translation methods using pre-trained models, demonstrating improved performance and data efficiency, especially with limited training data, through innovative training strategies and similarity loss techniques.

Contribution

It introduces strategies for building speech translation systems with pre-trained models on a single GPU and proposes a similarity loss to enhance data efficiency and translation quality.

Findings

01

End-to-end models outperform cascaded models in translation quality.

02

The similarity loss increases BLEU scores by 6 points with limited data.

03

Single GPU training is feasible for high-performance speech translation models.

Abstract

When building state-of-the-art speech translation models, the need for large computational resources is a significant obstacle due to the large training data size and complex models. The availability of pre-trained models is a promising opportunity to build strong speech translation systems efficiently. In a first step, we investigate efficient strategies to build cascaded and end-to-end speech translation systems based on pre-trained models. Using this strategy, we can train and apply the models on a single GPU. While the end-to-end models show superior translation performance to cascaded ones, the application of this technology has a limitation on the need for additional end-to-end training data. In a second step, we proposed an additional similarity loss to encourage the model to generate similar hidden representations for speech and transcript. Using this technique, we can increase…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis