Transfer training from smaller language model
Han Zhang

TL;DR
This paper proposes a method to efficiently scale up smaller pre-trained language models to larger ones by weight copying and padding, reducing training time and resource requirements while maintaining comparable performance.
Contribution
It introduces a novel transfer training technique that initializes larger models from smaller ones, leveraging existing weights to save computational resources.
Findings
Large models initialized from smaller ones perform comparably to directly trained larger models.
Starting training from a transferred model reduces initial training loss.
The method is validated across multiple datasets with positive results.
Abstract
Large language models have led to state-of-the-art accuracies across a range of tasks. However,training large language model needs massive computing resource, as more and more open source pre-training models are available, it is worthy to study how to take full advantage of available model. We find a method to save training time and resource cost by changing the small well-trained model to large model. We initialize a larger target model from a smaller source model by copy weight values from source model and padding with zeros or small initialization values on it to make the source and target model have approximate outputs, which is valid due to block matrix multiplication and residual connection in transformer structure. We test the target model on several data sets and find it is still comparable with the source model. When we continue training the target model, the training loss can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsResidual Connection
