When does Parameter-Efficient Transfer Learning Work for Machine Translation?
Ahmet \"Ust\"un, Asa Cooper Stickland

TL;DR
This paper provides a comprehensive empirical analysis of parameter-efficient fine-tuning methods for machine translation, showing their effectiveness depends on parameter budget, language pair, and pre-trained model size.
Contribution
It systematically evaluates PEFTs across various settings for MT, highlighting when they match or outperform full fine-tuning, especially with larger models and limited data.
Findings
Adapters perform on par with full fine-tuning at 10% parameter budget.
PEFT performance declines with fewer tuned parameters, especially for distant language pairs.
Larger pre-trained models with PEFT outperform smaller models with full fine-tuning.
Abstract
Parameter-efficient fine-tuning methods (PEFTs) offer the promise of adapting large pre-trained models while only tuning a small number of parameters. They have been shown to be competitive with full model fine-tuning for many downstream tasks. However, prior work indicates that PEFTs may not work as well for machine translation (MT), and there is no comprehensive study showing when PEFTs work for MT. We conduct a comprehensive empirical study of PEFTs for MT, considering (1) various parameter budgets, (2) a diverse set of language-pairs, and (3) different pre-trained models. We find that 'adapters', in which small feed-forward networks are added after every layer, are indeed on par with full model fine-tuning when the parameter budget corresponds to 10% of total model parameters. Nevertheless, as the number of tuned parameters decreases, the performance of PEFTs decreases. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
