DASViT: Differentiable Architecture Search for Vision Transformer

Pengjin Wu; Ferrante Neri; Zhenhua Feng

arXiv:2507.13079·cs.LG·July 18, 2025

DASViT: Differentiable Architecture Search for Vision Transformer

Pengjin Wu, Ferrante Neri, Zhenhua Feng

PDF

Open Access

TL;DR

DASViT introduces a differentiable neural architecture search method tailored for Vision Transformers, enabling the discovery of innovative, efficient architectures that outperform traditional designs with fewer parameters and FLOPs.

Contribution

This paper presents DASViT, a novel differentiable NAS approach specifically designed for ViTs, addressing limitations of previous methods and enabling the discovery of superior architectures.

Findings

01

DASViT architectures outperform ViT-B/16 on multiple datasets.

02

DASViT designs are more efficient with fewer parameters and FLOPs.

03

DASViT uncovers innovative Transformer architectures beyond traditional designs.

Abstract

Designing effective neural networks is a cornerstone of deep learning, and Neural Architecture Search (NAS) has emerged as a powerful tool for automating this process. Among the existing NAS approaches, Differentiable Architecture Search (DARTS) has gained prominence for its efficiency and ease of use, inspiring numerous advancements. Since the rise of Vision Transformers (ViT), researchers have applied NAS to explore ViT architectures, often focusing on macro-level search spaces and relying on discrete methods like evolutionary algorithms. While these methods ensure reliability, they face challenges in discovering innovative architectural designs, demand extensive computational resources, and are time-intensive. To address these limitations, we introduce Differentiable Architecture Search for Vision Transformer (DASViT), which bridges the gap in differentiable search for ViTs and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques

MethodsDropout · Vision Transformer · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Layer Normalization · Dense Connections · Softmax · Transformer