Exploring Learngene via Stage-wise Weight Sharing for Initializing Variable-sized Models
Shi-Yu Xia, Wenxuan Zhu, Xu Yang, Xin Geng

TL;DR
This paper introduces SWS, a stage-wise weight sharing method for initializing variable-sized models efficiently, leveraging learngene layers learned from large models to reduce training costs and improve performance.
Contribution
The paper proposes a novel SWS approach that enhances learngene-based initialization by incorporating stage guidance, significantly reducing training costs and storage for variable-sized models.
Findings
SWS outperforms models trained from scratch on ImageNet-1K.
Reduces training costs by approximately 6.6x.
Achieves better results with minimal fine-tuning after 1 epoch.
Abstract
In practice, we usually need to build variable-sized models adapting for diverse resource constraints in different application scenarios, where weight initialization is an important step prior to training. The Learngene framework, introduced recently, firstly learns one compact part termed as learngene from a large well-trained model, after which learngene is expanded to initialize variable-sized models. In this paper, we start from analysing the importance of guidance for the expansion of well-trained learngene layers, inspiring the design of a simple but highly effective Learngene approach termed SWS (Stage-wise Weight Sharing), where both learngene layers and their learning process critically contribute to providing knowledge and guidance for initializing models at varying scales. Specifically, to learn learngene layers, we build an auxiliary model comprising multiple stages where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Machine Learning and Data Classification
