M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation

Jinming Zhao; Hao Yang; Ehsan Shareghi; Gholamreza Haffari

arXiv:2207.00952·cs.CL·July 5, 2022

M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation

Jinming Zhao, Hao Yang, Ehsan Shareghi, Gholamreza Haffari

PDF

Open Access 1 Repo

TL;DR

M-Adapter introduces a Transformer-based module that effectively bridges the modality gap between speech and text, enhancing end-to-end speech-to-text translation performance.

Contribution

The paper presents M-Adapter, a novel module that adapts speech representations to text, improving translation quality by modeling dependencies and shrinking speech sequences.

Findings

01

Outperforms baseline by up to 1 BLEU score on Must-C En→DE dataset.

02

Effectively models global and local dependencies in speech sequences.

03

Bridges modality gap to enhance translation accuracy.

Abstract

End-to-end speech-to-text translation models are often initialized with pre-trained speech encoder and pre-trained text decoder. This leads to a significant training gap between pre-training and fine-tuning, largely due to the modality differences between speech outputs from the encoder and text inputs to the decoder. In this work, we aim to bridge the modality gap between speech and text to improve translation quality. We propose M-Adapter, a novel Transformer-based module, to adapt speech representations to text. While shrinking the speech sequence, M-Adapter produces features desired for speech-to-text translation via modelling global and local dependencies of a speech sequence. Our experimental results show that our model outperforms a strong baseline by up to 1 BLEU score on the Must-C En $\to$ DE dataset.\footnote{Our code is available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mingzi151/w2v2-st
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis