L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers

Sofia Casarin; Sergio Escalera; Oswald Lanz

arXiv:2505.07300·cs.CV·May 13, 2025

L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers

Sofia Casarin, Sergio Escalera, Oswald Lanz

PDF

Open Access

TL;DR

This paper introduces L-SWAG, a new zero-cost proxy for neural architecture search in Vision Transformers, enabling efficient, training-free model selection across multiple tasks and outperforming existing methods.

Contribution

The work extends zero-cost NAS to Vision Transformers, proposes L-SWAG as a novel generalizable proxy, and introduces LIBRA-NAS for effective proxy combination, achieving state-of-the-art results.

Findings

01

L-SWAG effectively characterizes architectures across 14 tasks.

02

LIBRA-NAS improves proxy combination for NAS.

03

Achieved 17.0% test error on ImageNet1k in 0.1 GPU days.

Abstract

Training-free Neural Architecture Search (NAS) efficiently identifies high-performing neural networks using zero-cost (ZC) proxies. Unlike multi-shot and one-shot NAS approaches, ZC-NAS is both (i) time-efficient, eliminating the need for model training, and (ii) interpretable, with proxy designs often theoretically grounded. Despite rapid developments in the field, current SOTA ZC proxies are typically constrained to well-established convolutional search spaces. With the rise of Large Language Models shaping the future of deep learning, this work extends ZC proxy applicability to Vision Transformers (ViTs). We present a new benchmark using the Autoformer search space evaluated on 6 distinct tasks and propose Layer-Sample Wise Activation with Gradients information (L-SWAG), a novel, generalizable metric that characterizes both convolutional and transformer architectures across 14 tasks.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications