TL;DR
This paper investigates adapter modules as a parameter-efficient method for multilingual speech translation, demonstrating their effectiveness in specializing and transferring models with minimal additional parameters.
Contribution
It provides a comprehensive analysis of adapter tuning for multilingual speech translation, including transfer learning from ASR and non-parallel multilingual data.
Findings
Adapters achieve competitive performance with full fine-tuning.
Adapters enable efficient specialization for specific language pairs.
Transfer from ASR and mBART models improves multilingual speech translation.
Abstract
Adapter modules were recently introduced as an efficient alternative to fine-tuning in NLP. Adapter tuning consists in freezing pretrained parameters of a model and injecting lightweight modules between layers, resulting in the addition of only a small number of task-specific trainable parameters. While adapter tuning was investigated for multilingual neural machine translation, this paper proposes a comprehensive analysis of adapters for multilingual speech translation (ST). Starting from different pre-trained models (a multilingual ST trained on parallel data or a multilingual BART (mBART) trained on non-parallel multilingual data), we show that adapters can be used to: (a) efficiently specialize ST to specific language pairs with a low extra cost in terms of parameters, and (b) transfer from an automatic speech recognition (ASR) task and an mBART pre-trained model to a multilingual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · mBART · Dense Connections · Softmax · Dropout · Byte Pair Encoding · Adam · Adapter
