Self-Supervised Weight Templates for Scalable Vision Model Initialization
Yucheng Xie, Fu Feng, Ruixiao Shi, Jing Wang, Yong Rui, Xin Geng

TL;DR
SWEET introduces a self-supervised, modular pre-training framework that learns a shared weight template and size-specific scalers, enabling scalable and flexible initialization of vision models across various architectures and tasks.
Contribution
It proposes a novel Tucker-based factorization approach for learning a shared weight template and scalers, supporting flexible model adaptation and width-invariant representations.
Findings
Achieves state-of-the-art results on classification, detection, segmentation, and generation tasks.
Supports efficient initialization for variable-sized models with minimal data.
Enhances cross-width generalization through width-wise stochastic scaling.
Abstract
The increasing scale and complexity of modern model parameters underscore the importance of pre-trained models. However, deployment often demands architectures of varying sizes, exposing limitations of conventional pre-training and fine-tuning. To address this, we propose SWEET, a self-supervised framework that performs constraint-based pre-training to enable scalable initialization in vision tasks. Instead of pre-training a fixed-size model, we learn a shared weight template and size-specific weight scalers under Tucker-based factorization, which promotes modularity and supports flexible adaptation to architectures with varying depths and widths. Target models are subsequently initialized by composing and reweighting the template through lightweight weight scalers, whose parameters can be efficiently learned from minimal training data. To further enhance flexibility in width expansion,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
