CTC-based Compression for Direct Speech Translation

Marco Gaido; Mauro Cettolo; Matteo Negri; Marco Turchi

arXiv:2102.01578·cs.CL·October 15, 2021

CTC-based Compression for Direct Speech Translation

Marco Gaido, Mauro Cettolo, Matteo Negri, Marco Turchi

PDF

1 Repo

TL;DR

This paper introduces a novel CTC-based method for dynamic input compression in direct speech translation models, improving translation quality and reducing memory usage without needing intermediate phonetic models.

Contribution

It presents the first approach to apply CTC-based dynamic compression directly within end-to-end speech translation systems, enhancing performance and efficiency.

Findings

01

Achieved 1.3-1.5 BLEU improvements over baseline

02

Reduced memory footprint by over 10%

03

Validated on English-Italian and English-German pairs

Abstract

Previous studies demonstrated that a dynamic phone-informed compression of the input audio is beneficial for speech translation (ST). However, they required a dedicated model for phone recognition and did not test this solution for direct ST, in which a single model translates the input audio into the target language without intermediate representations. In this work, we propose the first method able to perform a dynamic compression of the input indirect ST models. In particular, we exploit the Connectionist Temporal Classification (CTC) to compress the input sequence according to its phonetic characteristics. Our experiments demonstrate that our solution brings a 1.3-1.5 BLEU improvement over a strong baseline on two language pairs (English-Italian and English-German), contextually reducing the memory footprint by more than 10%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mgaido91/FBK-fairseq-ST
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.