RedApt: An Adaptor for wav2vec 2 Encoding \\ Faster and Smaller Speech Translation without Quality Compromise
Jinming Zhao, Hao Yang, Gholamreza Haffari, Ehsan Shareghi

TL;DR
RedApt is a novel adaptor that enhances wav2vec 2 speech encoders, achieving faster, smaller speech translation models with improved performance without quality loss.
Contribution
Introduces RedApt, a seamless adaptor for Transformer-based speech encoders, significantly reducing computation and memory while improving translation quality.
Findings
41% speedup in inference
33% memory reduction
Outperforms SotA by 0.68 BLEU on 8 language pairs
Abstract
Pre-trained speech Transformers in speech translation (ST) have facilitated state-of-the-art (SotA) results; yet, using such encoders is computationally expensive. To improve this, we present a novel Reducer Adaptor block, RedApt, that could be seamlessly integrated within any Transformer-based speech encoding architecture. Integrating the pretrained wav2vec 2 speech encoder with RedAptbrings 41% speedup, 33% memory reduction with 24% fewer FLOPs at inference. To our positive surprise, our ST model with RedApt outperforms the SotA architecture by an average of 0.68 BLEU score on 8 language pairs from Must-C.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
