When Training-Free NAS Meets Vision Transformer: A Neural Tangent Kernel Perspective
Qiqi Zhou, Yichen Zhu

TL;DR
This paper explores the limitations of Neural Tangent Kernel-based metrics for vision transformer NAS and proposes ViNTK, a Fourier feature-enhanced NTK, to improve search efficiency and performance.
Contribution
It introduces ViNTK, a novel NTK extension that captures high-frequency features, addressing previous limitations in ViT neural architecture search.
Findings
ViNTK significantly reduces NAS search costs.
ViNTK maintains competitive accuracy on classification and segmentation.
NTK's low-frequency bias limits its effectiveness for ViT NAS.
Abstract
This paper investigates the Neural Tangent Kernel (NTK) to search vision transformers without training. In contrast with the previous observation that NTK-based metrics can effectively predict CNNs performance at initialization, we empirically show their inefficacy in the ViT search space. We hypothesize that the fundamental feature learning preference within ViT contributes to the ineffectiveness of applying NTK to NAS for ViT. We both theoretically and empirically validate that NTK essentially estimates the ability of neural networks that learn low-frequency signals, completely ignoring the impact of high-frequency signals in feature learning. To address this limitation, we propose a new method called ViNTK that generalizes the standard NTK to the high-frequency domain by integrating the Fourier features from inputs. Experiments with multiple ViT search spaces on image classification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrared Target Detection Methodologies · Optical Polarization and Ellipsometry · Neural Networks and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Neural Tangent Kernel
