Efficiently Reusing Old Models Across Languages via Transfer Learning
Tom Kocmi, Ond\v{r}ej Bojar

TL;DR
This paper introduces a transfer learning method that reuses existing neural translation models for new language pairs without architectural changes, improving translation quality and reducing training time.
Contribution
It presents a simple, architecture-agnostic transfer learning approach that leverages pre-trained models for multiple language pairs, avoiding the need for separate parent models.
Findings
Better translation quality than training from scratch
Shorter convergence times in training
Effective reuse of existing models across languages
Abstract
Recent progress in neural machine translation is directed towards larger neural networks trained on an increasing amount of hardware resources. As a result, NMT models are costly to train, both financially, due to the electricity and hardware cost, and environmentally, due to the carbon footprint. It is especially true in transfer learning for its additional cost of training the "parent" model before transferring knowledge and training the desired "child" model. In this paper, we propose a simple method of re-using an already trained model for different language pairs where there is no need for modifications in model architecture. Our approach does not need a separate parent model for each investigated language pair, as it is typical in NMT transfer learning. To show the applicability of our method, we recycle a Transformer model trained by different researchers and use it to seed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
