ScaleNet: Searching for the Model to Scale

Jiyang Xie; Xiu Su; Shan You; Zhanyu Ma; Fei Wang; Chen; Qian

arXiv:2207.07267·cs.CV·July 18, 2022

ScaleNet: Searching for the Model to Scale

Jiyang Xie, Xiu Su, Shan You, Zhanyu Ma, Fei Wang, Chen, Qian

PDF

Open Access 1 Repo

TL;DR

ScaleNet introduces a joint search method for base models and scaling strategies, enabling the development of larger, high-performing models with reduced search costs by combining super-supernet training and evolutionary algorithms.

Contribution

It proposes a novel joint search framework for base models and scaling strategies, improving scalability and performance over existing methods.

Findings

01

Achieves significant performance improvements across various FLOPs.

02

Reduces search cost by at least 2.53 times.

03

Demonstrates effective scaling with a hierarchical sampling strategy.

Abstract

Recently, community has paid increasing attention on model scaling and contributed to developing a model family with a wide spectrum of scales. Current methods either simply resort to a one-shot NAS manner to construct a non-structural and non-scalable model family or rely on a manual yet fixed scaling strategy to scale an unnecessarily best base model. In this paper, we bridge both two components and propose ScaleNet to jointly search base model and scaling strategy so that the scaled large model can have more promising performance. Concretely, we design a super-supernet to embody models with different spectrum of sizes (e.g., FLOPs). Then, the scaling strategy can be learned interactively with the base model via a Markov chain-based evolution algorithm and generalized to develop even larger models. To obtain a decent super-supernet, we design a hierarchical sampling strategy to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luminolx/scalenet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis

MethodsResidual Connection · 1x1 Convolution · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Bottleneck Residual Block · Max Pooling · Average Pooling · Convolution · Balanced Selection · Scale Aggregation Block