WAVE: Weight Templates for Adaptive Initialization of Variable-sized Models
Fu Feng, Yucheng Xie, Jing Wang, Xin Geng

TL;DR
WAVE introduces a novel method for initializing variable-sized models using shared weight templates and size-specific scalers, enabling efficient, adaptable, and transfer-friendly model deployment across different sizes and tasks.
Contribution
WAVE reformulates variable-sized model initialization as a multi-task problem using shared templates and scalers, incorporating knowledge distillation for consistent, adaptable, and transferable initialization.
Findings
Achieves state-of-the-art performance in model initialization across various sizes.
Templates encapsulate task-agnostic knowledge transferable across datasets.
Efficient initialization with minimal training data.
Abstract
The growing complexity of model parameters underscores the significance of pre-trained models. However, deployment constraints often necessitate models of varying sizes, exposing limitations in the conventional pre-training and fine-tuning paradigm, particularly when target model sizes are incompatible with pre-trained ones. To address this challenge, we propose WAVE, a novel approach that reformulates variable-sized model initialization from a multi-task perspective, where initializing each model size is treated as a distinct task. WAVE employs shared, size-agnostic weight templates alongside size-specific weight scalers to achieve consistent initialization across various model sizes. These weight templates, constructed within the Learngene framework, integrate knowledge from pre-trained models through a distillation process constrained by Kronecker-based rules. Target models are then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis · Model Reduction and Neural Networks
MethodsSparse Evolutionary Training
