RedApt: An Adaptor for wav2vec 2 Encoding \\ Faster and Smaller Speech   Translation without Quality Compromise

Jinming Zhao; Hao Yang; Gholamreza Haffari; Ehsan Shareghi

arXiv:2210.08475·cs.CL·October 18, 2022

RedApt: An Adaptor for wav2vec 2 Encoding \\ Faster and Smaller Speech Translation without Quality Compromise

Jinming Zhao, Hao Yang, Gholamreza Haffari, Ehsan Shareghi

PDF

Open Access

TL;DR

RedApt is a novel adaptor that enhances wav2vec 2 speech encoders, achieving faster, smaller speech translation models with improved performance without quality loss.

Contribution

Introduces RedApt, a seamless adaptor for Transformer-based speech encoders, significantly reducing computation and memory while improving translation quality.

Findings

01

41% speedup in inference

02

33% memory reduction

03

Outperforms SotA by 0.68 BLEU on 8 language pairs

Abstract

Pre-trained speech Transformers in speech translation (ST) have facilitated state-of-the-art (SotA) results; yet, using such encoders is computationally expensive. To improve this, we present a novel Reducer Adaptor block, RedApt, that could be seamlessly integrated within any Transformer-based speech encoding architecture. Integrating the pretrained wav2vec 2 speech encoder with RedAptbrings 41% speedup, 33% memory reduction with 24% fewer FLOPs at inference. To our positive surprise, our ST model with RedApt outperforms the SotA architecture by an average of 0.68 BLEU score on 8 language pairs from Must-C.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling