ScaleNet: Scaling up Pretrained Neural Networks with Incremental Parameters
Zhiwei Hao, Jianyuan Guo, Li Shen, Kai Han, Yehui Tang, Han Hu, Yunhe Wang

TL;DR
ScaleNet provides a cost-effective method for expanding pretrained vision transformer models by adding layers with shared weights and minimal additional parameters, leading to improved performance and efficiency.
Contribution
We introduce ScaleNet, a novel approach for efficiently scaling pretrained ViT models through layer insertion and weight sharing, reducing training costs while maintaining or improving accuracy.
Findings
Achieves 7.42% accuracy improvement on ImageNet-1K with fewer epochs.
Enables efficient model expansion with negligible parameter increase.
Shows potential for downstream vision tasks like object detection.
Abstract
Recent advancements in vision transformers (ViTs) have demonstrated that larger models often achieve superior performance. However, training these models remains computationally intensive and costly. To address this challenge, we introduce ScaleNet, an efficient approach for scaling ViT models. Unlike conventional training from scratch, ScaleNet facilitates rapid model expansion with negligible increases in parameters, building on existing pretrained models. This offers a cost-effective solution for scaling up ViTs. Specifically, ScaleNet achieves model expansion by inserting additional layers into pretrained ViTs, utilizing layer-wise weight sharing to maintain parameters efficiency. Each added layer shares its parameter tensor with a corresponding layer from the pretrained model. To mitigate potential performance degradation due to shared weights, ScaleNet introduces a small set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · CCD and CMOS Imaging Sensors
