When Training-Free NAS Meets Vision Transformer: A Neural Tangent Kernel   Perspective

Qiqi Zhou; Yichen Zhu

arXiv:2405.04536·cs.CV·May 9, 2024

When Training-Free NAS Meets Vision Transformer: A Neural Tangent Kernel Perspective

Qiqi Zhou, Yichen Zhu

PDF

Open Access

TL;DR

This paper explores the limitations of Neural Tangent Kernel-based metrics for vision transformer NAS and proposes ViNTK, a Fourier feature-enhanced NTK, to improve search efficiency and performance.

Contribution

It introduces ViNTK, a novel NTK extension that captures high-frequency features, addressing previous limitations in ViT neural architecture search.

Findings

01

ViNTK significantly reduces NAS search costs.

02

ViNTK maintains competitive accuracy on classification and segmentation.

03

NTK's low-frequency bias limits its effectiveness for ViT NAS.

Abstract

This paper investigates the Neural Tangent Kernel (NTK) to search vision transformers without training. In contrast with the previous observation that NTK-based metrics can effectively predict CNNs performance at initialization, we empirically show their inefficacy in the ViT search space. We hypothesize that the fundamental feature learning preference within ViT contributes to the ineffectiveness of applying NTK to NAS for ViT. We both theoretically and empirically validate that NTK essentially estimates the ability of neural networks that learn low-frequency signals, completely ignoring the impact of high-frequency signals in feature learning. To address this limitation, we propose a new method called ViNTK that generalizes the standard NTK to the high-frequency domain by integrating the Fourier features from inputs. Experiments with multiple ViT search spaces on image classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInfrared Target Detection Methodologies · Optical Polarization and Ellipsometry · Neural Networks and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Neural Tangent Kernel