Searching for TrioNet: Combining Convolution with Local and Global Self-Attention
Huaijin Pi, Huiyu Wang, Yingwei Li, Zizhang Li, Alan Yuille

TL;DR
This paper introduces TrioNet, a novel neural architecture that combines convolution with local and global self-attention operators, optimized via NAS with hierarchical sampling and multi-head sharing, achieving superior performance on image classification tasks.
Contribution
The paper proposes a new architecture space combining convolution and self-attention, along with novel NAS strategies, to improve vision model performance.
Findings
TrioNet outperforms stand-alone models on ImageNet with fewer FLOPs.
TrioNet matches the best operators on small datasets.
Hierarchical Sampling improves supernet training efficiency.
Abstract
Recently, self-attention operators have shown superior performance as a stand-alone building block for vision models. However, existing self-attention models are often hand-designed, modified from CNNs, and obtained by stacking one operator only. A wider range of architecture space which combines different self-attention operators and convolution is rarely explored. In this paper, we explore this novel architecture space with weight-sharing Neural Architecture Search (NAS) algorithms. The result architecture is named TrioNet for combining convolution, local self-attention, and global (axial) self-attention operators. In order to effectively search in this huge architecture space, we propose Hierarchical Sampling for better training of the supernet. In addition, we propose a novel weight-sharing strategy, Multi-head Sharing, specifically for multi-head self-attention operators. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification
MethodsConvolution
