Differentiable Model Scaling using Differentiable Topk
Kai Liu, Ruohui Wang, Jianfei Gao, Kai Chen

TL;DR
This paper introduces Differentiable Model Scaling (DMS), a fully differentiable approach to optimize network width and depth, leading to improved performance across vision and NLP tasks with higher efficiency than existing NAS methods.
Contribution
The paper presents DMS, a novel differentiable method for scalable neural architecture search that enhances search efficiency and model performance across multiple tasks and architectures.
Findings
DMS improves ImageNet top-1 accuracy for EfficientNet-B0 by 1.4%.
DMS outperforms ZiCo in search efficiency, requiring only 0.4 GPU days.
DMS enhances object detection and language modeling performance.
Abstract
Over the past few years, as large language models have ushered in an era of intelligence emergence, there has been an intensified focus on scaling networks. Currently, many network architectures are designed manually, often resulting in sub-optimal configurations. Although Neural Architecture Search (NAS) methods have been proposed to automate this process, they suffer from low search efficiency. This study introduces Differentiable Model Scaling (DMS), increasing the efficiency for searching optimal width and depth in networks. DMS can model both width and depth in a direct and fully differentiable way, making it easy to optimize. We have evaluated our DMS across diverse tasks, ranging from vision tasks to NLP tasks and various network architectures, including CNNs and Transformers. Results consistently indicate that our DMS can find improved structures and outperforms state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEngineering Applied Research
MethodsFocus
