Differentiable Model Scaling using Differentiable Topk

Kai Liu; Ruohui Wang; Jianfei Gao; Kai Chen

arXiv:2405.07194·cs.CV·May 14, 2024·1 cites

Differentiable Model Scaling using Differentiable Topk

Kai Liu, Ruohui Wang, Jianfei Gao, Kai Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces Differentiable Model Scaling (DMS), a fully differentiable approach to optimize network width and depth, leading to improved performance across vision and NLP tasks with higher efficiency than existing NAS methods.

Contribution

The paper presents DMS, a novel differentiable method for scalable neural architecture search that enhances search efficiency and model performance across multiple tasks and architectures.

Findings

01

DMS improves ImageNet top-1 accuracy for EfficientNet-B0 by 1.4%.

02

DMS outperforms ZiCo in search efficiency, requiring only 0.4 GPU days.

03

DMS enhances object detection and language modeling performance.

Abstract

Over the past few years, as large language models have ushered in an era of intelligence emergence, there has been an intensified focus on scaling networks. Currently, many network architectures are designed manually, often resulting in sub-optimal configurations. Although Neural Architecture Search (NAS) methods have been proposed to automate this process, they suffer from low search efficiency. This study introduces Differentiable Model Scaling (DMS), increasing the efficiency for searching optimal width and depth in networks. DMS can model both width and depth in a direct and fully differentiable way, making it easy to optimize. We have evaluated our DMS across diverse tasks, ranging from vision tasks to NLP tasks and various network architectures, including CNNs and Transformers. Results consistently indicate that our DMS can find improved structures and outperforms state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LKJacky/Differentiable-Model-Scaling
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEngineering Applied Research

MethodsFocus